Elasticsearch
Elasticsearch是一个基于Apache Lucene的开源搜索引擎,无论是开源还是专有领域,Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库,Lucene非常复杂,而Elasticsearch通过RESTful API隐藏了Lucene的复杂性,让搜索变得更简单,不过Elasticsearch不仅仅是一个搜索,它更是文档NoSQL体系的一种,我们可以这样描述它:
- 分布式的实时文档存储,每个字段都可以被索引并可被搜索
- 分布式的实时分析搜索引擎
- 可以扩展到上百台大集群,处理PB级结构化或非结构化数据
而且这些功能被集成到一个服务中,通过RESTful API调用,满足各种编程语言的需求。基本组件
- 索引(index):文档容器,换句话说,索引时具有属性的文档集合,类似于表,索引名必须使用小写,每个索引的默认分片为5个,每个分片至少有一个副本
- 类型(type):类型时索引的逻辑分区,其意义完全取决于用户需求,一个索引内部可定义一个或多个类型,一般来说,类型就是拥有相同的域的文档的预定义
- 文档(documentt):文档是Lucene索引和搜索的原子单位,它包含了一个或多个域,是域的容器:基于json格式表示
- 映射(mapping):原始内容存储为文档之前需要实现分析,例如切词、过滤掉某些词等;映射用于定义分析机制该如何实现;除此之外,ES还为映射提供了诸如将域中的内容排序等功能
ES集群组件
- Cluster:ES的集群标识为集群名称;默认为‘elasticsearch’,节点就是依靠是名字来决定加入哪个集群,一个节点只能属于一个集群。
- Node:运行单个ES实例的主机即为节点,用于存储数据,参与集群索引及搜索操作,节点的标识靠节点名。
- Shard:将索引切割成为物理存储组件;但每一个shard都是一个独立且完整的索引;创建索引时,ES默认将其分割为5个shards,用户也可以按需定义,创建完成之后不可修改;shard有两种类型:primary shard和replia,replia用于数据冗余及查询时的负载均衡,每个主shard的副本数量可自定义,且可动态修改
ES Cluster启动时默认以多播或者单播的形式在9300/tcp查询同一集群中的其他节点,并与之通信。集群中所有节点会选举出一个主节点负责管理整个集群状态,以及在集群中决定shards的分布方式,站在用户角度而言,每个均接受并响应用户的各类请求。
ES Cluster的状态:
- green:所有主要分片和副本都可用
- yellow:所有主要分片可用,但不是所有复制分片都可用
- red:不是所有主要分片都可用
倒排索引
倒排索引是Lucene中的重要概念,也是ES能够快速检索出内容的重要原因,倒排索引源于实际应用中需要根据属性的值来查找记录,这种索引表中的每一项都包括了一个属性值和具备这种的记录的值,由于不是通过记录来确定属性值,而是由属性来确定记录的位置,所以被称为倒排索引。
在搜索过程中,一段数据需要存储,Lucene首先要进行切词操作,而每个切成的可是表示为这段数据的属性,而通过保存文档于属性对的方式存储下这段数据,而后在检索的过程中检索这种属性,通过属性就可以找到相对应的文档,当然还是有匹配的权重,匹配度越高被搜索到的越前面,很像我们使用的搜索引擎吧,这就是基本的倒排索引概念。
Elasticsearch安装
Elasticseach由java开发,所以我们需要安装java运行环境JDK,OpenJDK或者OracleJDK,最新的Elasticsearch必须在JDK 1.8的情况下运行。
1 | ~]# yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel |
2 | ~]# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.0.rpm |
3 | ~]# rpm -ivh elasticsearch-5.5.0.rpm |
4 | |
5 | elasticsearch对系统资源比较耗费,所以一些默认的系统系统参数需要修改下: |
6 | 问题一: |
7 | java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed |
8 | at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:363) ~[elasticsearch-5.5.0.jar:5.5.0] |
9 | at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:638) ~[elasticsearch-5.5.0.jar:5.5.0] |
10 | at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:215) [elasticsearch-5.5.0.jar:5.5.0] |
11 | at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:99) [elasticsearch-5.5.0.jar:5.5.0] |
12 | at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:111) [elasticsearch-5.5.0.jar:5.5.0] |
13 | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194) [elasticsearch-5.5.0.jar:5.5.0] |
14 | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:351) [elasticsearch-5.5.0.jar:5.5.0] |
15 | at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) [elasticsearch-5.5.0.jar:5.5.0] |
16 | at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) [elasticsearch-5.5.0.jar:5.5.0] |
17 | at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) [elasticsearch-5.5.0.jar:5.5.0] |
18 | at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.5.0.jar:5.5.0] |
19 | at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.5.0.jar:5.5.0] |
20 | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) [elasticsearch-5.5.0.jar:5.5.0] |
21 | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) [elasticsearch-5.5.0.jar:5.5.0] |
22 | |
23 | 这是一个警告,采用最新的内核就可以解决,不影响使用。 |
24 | |
25 | 问题二: |
26 | max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] |
27 | ~]# vim /etc/security/limits.conf |
28 | * soft nofile 65536 |
29 | * hard nofile 65536 |
30 | |
31 | 问题三: |
32 | max number of threads [1024] for user [elasticsearch] is too low, increase to at least [2048] |
33 | ~]# vim /etc/security/limits.d/90-nproc.conf |
34 | * soft nproc 2048 |
35 | * hard nproc 2048 |
36 | |
37 | 问题四: |
38 | max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144] |
39 | ~]# vim /etc/sysctl.conf |
40 | vm.max_map_count=655360 |
41 | ~]# sysctl -p |
42 | |
43 | 问题五: |
44 | system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk |
45 | ~]# vim /etc/elasticsearch/elasticsearch.yml |
46 | bootstrap.system_call_filter: false |
elasticsearch.yml配置:
1 | ~]# cat /etc/elasticsearch/elasticsearch.yml |
2 | cluster.name: MyES 集群名称,相同的集群使用同一集群名称来辨别 |
3 | node.name: node1 节点名称 |
4 | #node.attr.rack: r1 集群附加属性 |
5 | path.data: /data/elastic 数据存储文档目录 |
6 | path.logs: /data/elastic/log 日志目录 |
7 | network.host: 0.0.0.0 绑定的ip |
8 | http.port: 9200 restful api的接口 |
9 | transport.tcp.port: 9300 参与集群事务通信的端口 |
10 | discovery.zen.ping.unicast.hosts: ["10.211.55.48", "10.211.55.49"] 集群单播检查存活 |
11 | discovery.zen.minimum_master_nodes: 2 当集群分区选举新的主节点时,选举要求总节点/2+1,所以这里最小的节点数应该为奇数,这里只是为了试验 |
12 | gateway.recover_after_nodes: 2 当一个集群恢复或者重新启动的时候,最少需要几个节点启动,集群才会启动 |
13 | action.destructive_requires_name: true 当删除索引的时候需要精确名称 |
14 | ~]# service elasticsearch start |
15 | ~]# tail -f /data/elastic/log/MyES.log |
16 | [2017-05-16T12:27:12,599][WARN ][o.e.d.z.ZenDiscovery ] [node2] not enough master nodes discovered during pinging (found [[Candidate{node={node2}{HYFYyQ31QmatkqJXoKsNCw}{ikvdDqarTHm-aWCX_89CcQ}{10.211.55.49}{10.211.55.49:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again |
17 | [2017-05-16T12:27:15,600][WARN ][o.e.d.z.ZenDiscovery ] [node2] not enough master nodes discovered during pinging (found [[Candidate{node={node2}{HYFYyQ31QmatkqJXoKsNCw}{ikvdDqarTHm-aWCX_89CcQ}{10.211.55.49}{10.211.55.49:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again |
18 | [2017-05-16T12:27:18,441][WARN ][o.e.n.Node ] [node2] timed out while waiting for initial discovery state - timeout: 30s |
19 | [2017-05-16T12:27:18,460][INFO ][o.e.h.n.Netty4HttpServerTransport] [node2] publish_address {10.211.55.49:9200}, bound_addresses {[::]:9200} |
20 | [2017-05-16T12:27:18,460][INFO ][o.e.n.Node ] [node2] started |
21 | [2017-05-16T12:27:18,602][WARN ][o.e.d.z.ZenDiscovery ] [node2] not enough master nodes discovered during pinging (found [[Candidate{node={node2}{HYFYyQ31QmatkqJXoKsNCw}{ikvdDqarTHm-aWCX_89CcQ}{10.211.55.49}{10.211.55.49:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again |
22 | [2017-05-16T12:27:21,604][WARN ][o.e.d.z.ZenDiscovery ] [node2] not enough master nodes discovered during pinging (found [[Candidate{node={node2}{HYFYyQ31QmatkqJXoKsNCw}{ikvdDqarTHm-aWCX_89CcQ}{10.211.55.49}{10.211.55.49:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again |
23 | [2017-05-16T12:27:24,606][WARN ][o.e.d.z.ZenDiscovery ] [node2] not enough master nodes discovered during pinging (found [[Candidate{node={node2}{HYFYyQ31QmatkqJXoKsNCw}{ikvdDqarTHm-aWCX_89CcQ}{10.211.55.49}{10.211.55.49:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again |
24 | [2017-05-16T12:27:35,256][INFO ][o.e.c.s.ClusterService ] [node2] new_master {node2}{HYFYyQ31QmatkqJXoKsNCw}{ikvdDqarTHm-aWCX_89CcQ}{10.211.55.49}{10.211.55.49:9300}, added {{node1}{HrlO474CRxK0XJv_0w4cvg}{PlIc8KhdRKKTmWTK5bRfhQ}{10.211.55.48}{10.211.55.48:9300},}, reason: zen-disco-elected-as-master ([1] nodes joined)[{node1}{HrlO474CRxK0XJv_0w4cvg}{PlIc8KhdRKKTmWTK5bRfhQ}{10.211.55.48}{10.211.55.48:9300}] |
25 | [2017-05-16T12:27:35,373][INFO ][o.e.g.GatewayService ] [node2] recovered [0] indices into cluster_state |
26 | ~]# netstat -ntlp |
27 | Active Internet connections (only servers) |
28 | Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name |
29 | tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 2504/sshd |
30 | tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2654/master |
31 | tcp 0 0 :::9200 :::* LISTEN 14495/java |
32 | tcp 0 0 :::9300 :::* LISTEN 14495/java |
33 | tcp 0 0 :::22 :::* LISTEN 2504/sshd |
34 | tcp 0 0 ::1:25 :::* LISTEN 2654/master |
Restful API
四类API:
- (1)检查集群、节点、索引健康与否,及获取相应状态
- (2)管理集群、节点、索引及元数据
- (3)执行CRUD操作
- (4)执行高级操作,例如paging,fitering等
ES访问接口: TCP/9200
1 | curl -X<VERB> '<PROTOCOL>://HOST:PORT/<PATH>?<QUERY_STRING>' -d '<BODY>' |
首先我们先检查下集群和节点的状态:
1 | ~]# curl '10.211.55.49:9200/' |
2 | { |
3 | "name" : "node2", |
4 | "cluster_name" : "MyES", |
5 | "cluster_uuid" : "VvFCdamHRJWoX8NJIK76Qw", |
6 | "version" : { |
7 | "number" : "5.5.0", |
8 | "build_hash" : "260387d", |
9 | "build_date" : "2017-06-30T23:16:05.735Z", |
10 | "build_snapshot" : false, |
11 | "lucene_version" : "6.6.0" |
12 | }, |
13 | "tagline" : "You Know, for Search" |
14 | } |
15 | ~]# curl '10.211.55.48:9200/' |
16 | { |
17 | "name" : "node1", |
18 | "cluster_name" : "MyES", |
19 | "cluster_uuid" : "VvFCdamHRJWoX8NJIK76Qw", |
20 | "version" : { |
21 | "number" : "5.5.0", |
22 | "build_hash" : "260387d", |
23 | "build_date" : "2017-06-30T23:16:05.735Z", |
24 | "build_snapshot" : false, |
25 | "lucene_version" : "6.6.0" |
26 | }, |
27 | "tagline" : "You Know, for Search" |
28 | } |
29 | |
30 | 我们可以看到这连个节点都是属于MyES集群,就像ES的集群中tagline一样,“You Know, for Search”,这就是为了大数据搜索而准备的集群。 |
31 | |
32 | ~]# curl -XGET "http://10.211.55.48:9200/_cluster/health?pretty" |
33 | { |
34 | "cluster_name" : "MyES", |
35 | "status" : "green", |
36 | "timed_out" : false, |
37 | "number_of_nodes" : 2, |
38 | "number_of_data_nodes" : 2, |
39 | "active_primary_shards" : 0, |
40 | "active_shards" : 0, |
41 | "relocating_shards" : 0, |
42 | "initializing_shards" : 0, |
43 | "unassigned_shards" : 0, |
44 | "delayed_unassigned_shards" : 0, |
45 | "number_of_pending_tasks" : 0, |
46 | "number_of_in_flight_fetch" : 0, |
47 | "task_max_waiting_in_queue_millis" : 0, |
48 | "active_shards_percent_as_number" : 100.0 |
49 | } |
50 | |
51 | 我们的集群处于green状态,说名所以分片和副本都是可用正常的。 |
52 | |
53 | ~]# curl -XGET "http://10.211.55.48:9200/_cluster/state?pretty" |
54 | { |
55 | "cluster_name" : "MyES", |
56 | "version" : 2, |
57 | "state_uuid" : "Twy26y7dTtqilrjUEmDalQ", |
58 | "master_node" : "HYFYyQ31QmatkqJXoKsNCw", |
59 | "blocks" : { }, |
60 | "nodes" : { |
61 | "HrlO474CRxK0XJv_0w4cvg" : { |
62 | "name" : "node1", |
63 | "ephemeral_id" : "PlIc8KhdRKKTmWTK5bRfhQ", |
64 | "transport_address" : "10.211.55.48:9300", |
65 | "attributes" : { } |
66 | }, |
67 | "HYFYyQ31QmatkqJXoKsNCw" : { |
68 | "name" : "node2", |
69 | "ephemeral_id" : "ikvdDqarTHm-aWCX_89CcQ", |
70 | "transport_address" : "10.211.55.49:9300", |
71 | "attributes" : { } |
72 | } |
73 | }, |
74 | "metadata" : { |
75 | "cluster_uuid" : "VvFCdamHRJWoX8NJIK76Qw", |
76 | "templates" : { }, |
77 | "indices" : { }, |
78 | "index-graveyard" : { |
79 | "tombstones" : [ ] |
80 | } |
81 | }, |
82 | "routing_table" : { |
83 | "indices" : { } |
84 | }, |
85 | "routing_nodes" : { |
86 | "unassigned" : [ ], |
87 | "nodes" : { |
88 | "HrlO474CRxK0XJv_0w4cvg" : [ ], |
89 | "HYFYyQ31QmatkqJXoKsNCw" : [ ] |
90 | } |
91 | } |
92 | } |
93 | 这是查看集群状态的信息。 |
94 | |
95 | ~]# curl "10.211.55.49:9200/_nodes/node1/state?pretty" |
96 | { |
97 | "_nodes" : { |
98 | "total" : 1, |
99 | "successful" : 1, |
100 | "failed" : 0 |
101 | }, |
102 | "cluster_name" : "MyES", |
103 | "nodes" : { |
104 | "HrlO474CRxK0XJv_0w4cvg" : { |
105 | "name" : "node1", |
106 | "transport_address" : "10.211.55.48:9300", |
107 | "host" : "10.211.55.48", |
108 | "ip" : "10.211.55.48", |
109 | "version" : "5.5.0", |
110 | "build_hash" : "260387d", |
111 | "roles" : [ |
112 | "master", |
113 | "data", |
114 | "ingest" |
115 | ] |
116 | } |
117 | } |
118 | } |
119 | |
120 | 看不惯json接口的数据,ES集群也为我们提供一个_cat接口: |
121 | ~]# curl -XGET "http://10.211.55.48:9200/_cat" |
122 | =^.^= |
123 | /_cat/allocation |
124 | /_cat/shards |
125 | /_cat/shards/{index} |
126 | /_cat/master |
127 | /_cat/nodes |
128 | /_cat/tasks |
129 | /_cat/indices |
130 | /_cat/indices/{index} |
131 | /_cat/segments |
132 | /_cat/segments/{index} |
133 | /_cat/count |
134 | /_cat/count/{index} |
135 | /_cat/recovery |
136 | /_cat/recovery/{index} |
137 | /_cat/health |
138 | /_cat/pending_tasks |
139 | /_cat/aliases |
140 | /_cat/aliases/{alias} |
141 | /_cat/thread_pool |
142 | /_cat/thread_pool/{thread_pools} |
143 | /_cat/plugins |
144 | /_cat/fielddata |
145 | /_cat/fielddata/{fields} |
146 | /_cat/nodeattrs |
147 | /_cat/repositories |
148 | /_cat/snapshots/{repository} |
149 | /_cat/templates |
150 | |
151 | _cat api接口为我们提供了一个功能选择。 |
152 | |
153 | ~]# curl -XGET "http://10.211.55.48:9200/_cat/nodes?v" |
154 | ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name |
155 | 10.211.55.48 5 94 0 0.35 0.31 0.34 mdi - node1 |
156 | 10.211.55.49 7 94 0 0.08 0.30 0.34 mdi * node2 |
157 | 我们可以在状态中看见集群主节点时node2 |
Plugin
ES集群很多功能都需要扩展来完成,而有些Plugin是必须安装的,常使用的Plugin有:
- marvel
- bigdesk
- head
- kopf
这些都是站点插件,可以在网页直接管理es集群。
那么如何安装Plugin呢?
- 直接将插件放置plugin目录中即可:/usr/share/elasticsearch/plugins
- 使用elasticsearch-plugin来安转:/usr/share/elasticsearch/bin/elasticsearch-plugin
CRUD
创建
1
~]# curl -XPUT "10.211.55.48:9200/students/class1/1?pretty" -d '
2
{
3
"name": "jusene",
4
"age": 25,
5
"class": "English"
6
}'
7
{
8
"_index" : "students",
9
"_type" : "class1",
10
"_id" : "1",
11
"_version" : 1,
12
"result" : "created",
13
"_shards" : {
14
"total" : 2,
15
"successful" : 2,
16
"failed" : 0
17
},
18
"created" : true
19
}
20
~]# curl -XPUT "10.211.55.48:9200/students/class2/1?pretty" -d '
21
{
22
"name": "jack",
23
"age": 24,
24
"class": "Math"
25
}'
26
{
27
"_index" : "students",
28
"_type" : "class2",
29
"_id" : "1",
30
"_version" : 1,
31
"result" : "created",
32
"_shards" : {
33
"total" : 2,
34
"successful" : 2,
35
"failed" : 0
36
},
37
"created" : true
38
}
查看
1
~]# curl -XGET "10.211.55.48:9200/students/class1/1?pretty"
2
{
3
"_index" : "students",
4
"_type" : "class1",
5
"_id" : "1",
6
"_version" : 1,
7
"found" : true,
8
"_source" : {
9
"name" : "jusene",
10
"age" : 25,
11
"class" : "English"
12
}
13
}
修改
1
~]# curl -XPOST "10.211.55.48:9200/students/class1/1/_update?pretty" -d '{"doc": {"age": 26}}'
2
{
3
"_index" : "students",
4
"_type" : "class1",
5
"_id" : "1",
6
"_version" : 2,
7
"result" : "updated",
8
"_shards" : {
9
"total" : 2,
10
"successful" : 2,
11
"failed" : 0
12
}
13
}
14
~]# curl -XGET "10.211.55.48:9200/students/class1/1?pretty"
15
{
16
"_index" : "students",
17
"_type" : "class1",
18
"_id" : "1",
19
"_version" : 2,
20
"found" : true,
21
"_source" : {
22
"name" : "jusene",
23
"age" : 26,
24
"class" : "English"
25
}
26
}
27
28
注意:如果用put是覆盖这个文档
删除
1
~]# curl -XDELETE '10.211.55.48:9200/students/class1/1?pretty'
2
{
3
"found" : true,
4
"_index" : "students",
5
"_type" : "class1",
6
"_id" : "1",
7
"_version" : 3,
8
"result" : "deleted",
9
"_shards" : {
10
"total" : 2,
11
"successful" : 2,
12
"failed" : 0
13
}
14
}
15
~]# curl -XGET "10.211.55.48:9200/students/class1/1?pretty"
16
{
17
"_index" : "students",
18
"_type" : "class1",
19
"_id" : "1",
20
"found" : false
21
}
22
23
同理删除类 或者 索引
24
25
~]# curl -XDELETE '10.211.55.48:9200/students/class1?pretty
26
~]# curl -XDELETE '10.211.55.48:9200/students?pretty
查询数据
Query API:
- Query DSL:JSON based language for building complex queries
用于实现诸多类型的查询操作,比如,simple term query,phrase,range,boolean,fuzzy等 - 多索引、多类型查询
多索引、多类型查询
1 | /_search:所以索引 |
2 | /INDEX_NAME/_search:单索引 |
3 | /INDEX1,INDEX2/_search:多索引 |
4 | /s*,t*/_search: |
5 | /students/class1/_search:单类型搜索 |
6 | /students/class1,class2/_search:多类型搜索 |
ES:对每一个文档。会取得其所以域的所以值,生成一个名为_all的域:执行查询时,如果在query_string未指定查询的域,则在_all域上执行查询操作。
如:
1 | - GET /_search?q='zgx' |
2 | - GET /_search?q='zhang%20guoxing' |
3 | - GET /_search?q=name:'zgx' |
4 | - GET /_search?q=name:'zhang%20guoxing' |
1 | ~]# curl "10.211.55.48:9200/_search?q='zgx'&pretty" |
2 | { |
3 | "took" : 74, |
4 | "timed_out" : false, |
5 | "_shards" : { |
6 | "total" : 5, |
7 | "successful" : 5, |
8 | "failed" : 0 |
9 | }, |
10 | "hits" : { |
11 | "total" : 2, |
12 | "max_score" : 0.17225473, |
13 | "hits" : [ |
14 | { |
15 | "_index" : "students", |
16 | "_type" : "class1", |
17 | "_id" : "4", |
18 | "_score" : 0.17225473, |
19 | "_source" : { |
20 | "name" : "zgx", |
21 | "age" : 25, |
22 | "class" : "English" |
23 | } |
24 | }, |
25 | { |
26 | "_index" : "students", |
27 | "_type" : "class1", |
28 | "_id" : "6", |
29 | "_score" : 0.17225473, |
30 | "_source" : { |
31 | "name" : "zhang guoxing", |
32 | "age" : 25, |
33 | "desc" : "zgx" |
34 | } |
35 | } |
36 | ] |
37 | } |
38 | } |
39 | |
40 | 我们还可看见对这个搜索我们还有score分数的评判 |
1 | ~]# curl "10.211.55.48:9200/_search?q='zhang%20guoxing'&pretty" |
2 | { |
3 | "took" : 12, |
4 | "timed_out" : false, |
5 | "_shards" : { |
6 | "total" : 5, |
7 | "successful" : 5, |
8 | "failed" : 0 |
9 | }, |
10 | "hits" : { |
11 | "total" : 2, |
12 | "max_score" : 1.3097504, |
13 | "hits" : [ |
14 | { |
15 | "_index" : "students", |
16 | "_type" : "class1", |
17 | "_id" : "6", |
18 | "_score" : 1.3097504, |
19 | "_source" : { |
20 | "name" : "zhang guoxing", |
21 | "age" : 25, |
22 | "desc" : "zgx" |
23 | } |
24 | }, |
25 | { |
26 | "_index" : "students", |
27 | "_type" : "class1", |
28 | "_id" : "5", |
29 | "_score" : 0.5753642, |
30 | "_source" : { |
31 | "name" : "zhang guoxing", |
32 | "age" : 25, |
33 | "class" : "English" |
34 | } |
35 | } |
36 | ] |
37 | } |
38 | } |
1 | ~]# curl "10.211.55.48:9200/_search?q=name:'zhang%20guoxing'&pretty" |
2 | { |
3 | "took" : 5, |
4 | "timed_out" : false, |
5 | "_shards" : { |
6 | "total" : 5, |
7 | "successful" : 5, |
8 | "failed" : 0 |
9 | }, |
10 | "hits" : { |
11 | "total" : 2, |
12 | "max_score" : 0.6548752, |
13 | "hits" : [ |
14 | { |
15 | "_index" : "students", |
16 | "_type" : "class1", |
17 | "_id" : "6", |
18 | "_score" : 0.6548752, |
19 | "_source" : { |
20 | "name" : "zhang guoxing", |
21 | "age" : 25, |
22 | "desc" : "zgx" |
23 | } |
24 | }, |
25 | { |
26 | "_index" : "students", |
27 | "_type" : "class1", |
28 | "_id" : "5", |
29 | "_score" : 0.2876821, |
30 | "_source" : { |
31 | "name" : "zhang guoxing", |
32 | "age" : 25, |
33 | "class" : "English" |
34 | } |
35 | } |
36 | ] |
37 | } |
38 | } |
1 | ~]# curl "10.211.55.48:9200/_search?q=name:'zgx'&pretty" |
2 | { |
3 | "took" : 27, |
4 | "timed_out" : false, |
5 | "_shards" : { |
6 | "total" : 5, |
7 | "successful" : 5, |
8 | "failed" : 0 |
9 | }, |
10 | "hits" : { |
11 | "total" : 1, |
12 | "max_score" : 0.80259144, |
13 | "hits" : [ |
14 | { |
15 | "_index" : "students", |
16 | "_type" : "class1", |
17 | "_id" : "4", |
18 | "_score" : 0.80259144, |
19 | "_source" : { |
20 | "name" : "zgx", |
21 | "age" : 25, |
22 | "class" : "English" |
23 | } |
24 | } |
25 | ] |
26 | } |
27 | } |
前两个:表示在_all域搜索
后两个: 表示在特定的类型上搜索
数据类型:string,number,boolean,dates
查看执行上mapping类型:
1 | ~]# curl "10.211.55.48:9200/students/_mapping/class1?pretty" |
2 | { |
3 | "students" : { |
4 | "mappings" : { |
5 | "class1" : { |
6 | "properties" : { |
7 | "age" : { |
8 | "type" : "long" |
9 | }, |
10 | "class" : { |
11 | "type" : "text", |
12 | "fields" : { |
13 | "keyword" : { |
14 | "type" : "keyword", |
15 | "ignore_above" : 256 |
16 | } |
17 | } |
18 | }, |
19 | "desc" : { |
20 | "type" : "text", |
21 | "fields" : { |
22 | "keyword" : { |
23 | "type" : "keyword", |
24 | "ignore_above" : 256 |
25 | } |
26 | } |
27 | }, |
28 | "name" : { |
29 | "type" : "text", |
30 | "fields" : { |
31 | "keyword" : { |
32 | "type" : "keyword", |
33 | "ignore_above" : 256 |
34 | } |
35 | } |
36 | } |
37 | } |
38 | } |
39 | } |
40 | } |
41 | } |
42 | |
43 | 我们可以看见在这个类中的字端的映射关系 |
ES中的搜索的数据广义上可被理解两类:
types:exact 精确搜索:指未经加工的原始值:在搜索时进行精确匹配,类似于sql语句
full-text 全文搜索:用于引用文本中的数据:判断文档在多大程度上匹配查询请求:即评估文档与用户请求查询的相关度,这个才是ES最强大的地方
为了完成full-text搜索,ES必须首先分许文本,并创建出倒排索引,倒排索引中的数据还需正规化标准化处理,如全部小写等,当采用不同的分析器处理文本搜索的时候,因为不同的分析器采用的标准不同,所以搜索结果还是有出入的。
上述过程我们也可以同称为分析,分析按照Lucene来说可以是分词和正规化构建倒排索引的过程,分析由分析器组成,分析器由三个组件组成:字符过滤器,分词器,分词过滤器。ES内置的分析器:
- Standard analyzer
- Simple analyzer
- Whitespace analyzer
- Language analyzer
分析器不仅在创建索引时用到:在构建查询时也会用到,索引在创建和查询的时候分析器使用不一致,查询结果都是不尽相同的。
Query DSL
Query DSL通过request body来完成:
分成两类:
- query dsl:执行full-text查询时,基于相关度来评判其匹配结
查询执行过程复制,且不会被缓存 - filter dsl:执行exact查询,基于其结果为yes或者no进行评判
速度快,且结果缓存
Filter DSL
term filter:精准匹配包含指定term的文档
1
~]# curl "10.211.55.24:9200/students/_search?pretty" -d {
2
"query":{
3
"term":{
4
"name": "jusene"
5
}
6
}
7
}
8
{
9
"took" : 4,
10
"timed_out" : false,
11
"_shards" : {
12
"total" : 5,
13
"successful" : 5,
14
"failed" : 0
15
},
16
"hits" : {
17
"total" : 2,
18
"max_score" : 0.6931472,
19
"hits" : [
20
{
21
"_index" : "students",
22
"_type" : "class1",
23
"_id" : "1",
24
"_score" : 0.6931472,
25
"_source" : {
26
"name" : "jusene",
27
"age" : 25,
28
"class" : "English"
29
}
30
},
31
{
32
"_index" : "students",
33
"_type" : "class1",
34
"_id" : "3",
35
"_score" : 0.2876821,
36
"_source" : {
37
"name" : "jusene",
38
"age" : 25,
39
"class" : "English"
40
}
41
}
42
]
43
}
44
}
terms filter:精准匹配多个精致值
1
~]# curl "10.211.55.48:9200/students/_search?pretty" -d {
2
"query":{
3
"terms":{
4
"name":["jusene","zgx"]
5
}
6
}
7
}
range filter:用于指定范围内查找数值和时间
1
~]# curl "10.211.55.48:9200/students/_search?pretty" -d '{
2
"query":{
3
"range":{
4
"age":{
5
"lt":25
6
}
7
}
8
}
9
}'
exists filter
1
~]# curl "10.211.55.48:9200/students/_search?pretty" -d '{
2
"query":{
3
"exists":{
4
"field": "age"
5
}
6
}
7
}'
boolean filter
基于boolean的逻辑来合并多个filter子句
must:其内部所以的子句条件必须同时匹配,即and
must_not: 其所有子句必须不匹配,即not
should: 至少有一个子句匹配,即or
1 | ~]# curl "10.211.55.48:9200/students/_search?pretty" -d '{ |
2 | "query":{ |
3 | "bool":{ |
4 | "must":{ |
5 | "term":{"age": 24} |
6 | }, |
7 | "must_not":{ |
8 | "term":{"name":"zgx"} |
9 | }, |
10 | "should":[ |
11 | {"term":{"class":"English"}}, |
12 | {"term":{"class":"Math"}} |
13 | ] |
14 | } |
15 | } |
16 | }' |
Query DSL
match_all:用于匹配所以文档,没有指定query,默认即为match_all query
1
~]# curl '10.211.55.48:9200/_search?pretty' -d '
2
{
3
"query": {"match_all": {}}
4
}'
match:在几乎任何域上执行full_text和exact-value查询
1
执行full-text查询,首先对查询时的语句进行分析
2
~]# curl "10.211.55.48:9200/_search?pretty" -d '{
3
"query":{
4
"match":{"name":"zgx"}
5
}
6
}
7
'
8
9
如果执行exact-value查询:搜索精确值,此时,建议使用过滤,而非查询
10
~]# curl "10.211.55.48:9200/students/_search?pretty" -d '{
11
"query":{
12
"match":{"name":"zgx"}
13
}
14
}'
multi_match:用于多个域上执行相同的查询
1
~]# curl "10.211.55.48:9200/_search?pretty" -d '{
2
"query":{
3
“multi_match”:{
4
"query":"zgx",
5
"fields":["name","desc"]
6
}
7
8
}
9
}'
bool query:基于boolean逻辑合并多个查询语句,与bool filter不同的是,查询子句不是返回yes或no,而是其计算出的匹配度分值,因此,boolean Query会为各子句合并其score
1
~]# curl "10.211.55.48:9200/students/_search?pretty" -d '{
2
"query":{
3
"bool":{
4
"must":{
5
"range":{"gte": 24}
6
},
7
"must_not":{
8
"match":{"name":"zgx"}
9
},
10
"should":[
11
{"match":{"class":"English"}},
12
{"match":{"class":"Math"}}
13
]
14
}
15
}
16
}'
wildcards query:shell统配符查询
1
~]# curl "10.211.55.48:9200/students/class1/_search?pretty" -d '{
2
"query":{
3
"wildcards":{
4
"name":"z*x"
5
}
6
}
7
}'
regexp query:正则查询
1
~]# curl "10.211.55.48:9200/_search?pretty" -d '{
2
"query":{
3
"regexp":{
4
"age":"[0-9]+"
5
}
6
}
7
}'
prefix query:前缀查询
1
~]# curl "10.211.55.48:9200/_search?pretty" -d '{
2
"query":{
3
"prefix":{
4
"class":"M"
5
}
6
}
7
}'
phrase match:短语匹配
1
~]# curl "10.211.55.48:9200/_search?pretty" -d '{
2
"query":{
3
"match_phrase":{
4
"name": "zhang guoxing"
5
}
6
}
7
}'
复合查询
即使用filter dsl和query dsl
1 | ~]# curl "10.211.55.48:9200/_search?pretty" -d '{ |
2 | "query":{ |
3 | "filtered":{ |
4 | "filter":{ |
5 | "range":{ |
6 | "age":{"gt":24} |
7 | } |
8 | }, |
9 | "query":{ |
10 | "match":{ |
11 | "name":"jusene" |
12 | } |
13 | } |
14 | } |
15 | } |
16 | }' |
高亮搜索
1 | ~]# curl "10.211.55.48:9200/_search?pretty" -d '{ |
2 | "query":{ |
3 | "match":{ |
4 | "name":"jusene" |
5 | } |
6 | }, |
7 | "highlight":{ |
8 | "fields":{ |
9 | "name":{} |
10 | } |
11 | } |
这里包含了来自name字段中的文本,并且用来标识匹配到的单词。
检查DSL语法
1 | ~]# curl "10.211.55.48:9200/students/_validate?pretty" -d "body" |
查考资料:
https://es.xiaoleilu.com/index.html
http://www.cnblogs.com/ghj1976/p/5293250.html