Elasticsearch

2022-04-29 约 2164 字预计阅读 5 分钟

Elasticsearch 是一个分布式的 RESTful 搜索和分析引擎，能够处理越来越多的用例。作为 Elastic Stack 的核心，它集中存储您的数据，以实现闪电般的快速搜索、微调相关性以及轻松扩展的强大分析。

安装

1
2
3
4


wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.3-darwin-aarch64.tar.gz
tar zxvf elasticsearch-8.1.3-darwin-aarch64.tar.gz
cd elasticsearch-8.1.3
code config/elasticsearch.yml

1
2
3


http.port: 9200
path.data: /Users/ynthm/Apps/elastic/es-data/data
path.logs: /Users/ynthm/Apps/elastic/es-data/logs

1

bin/elasticsearch

✅ Elasticsearch security features have been automatically configured! ✅ Authentication is enabled and cluster connections are encrypted.

ℹ️ Password for the elastic user (reset with bin/elasticsearch-reset-password -u elastic): _3KGfFfp=LgwTIocCwul

ℹ️ HTTP CA certificate SHA-256 fingerprint: 4c9c8b6e8125b6d3ef4a1c556c81c71b1e0906724a2156ccbd14707d42ed2078

ℹ️ Configure Kibana to use this cluster: • Run Kibana and click the configuration link in the terminal when Kibana starts. • Copy the following enrollment token and paste it into Kibana in your browser (valid for the next 30 minutes): eyJ2ZXIiOiI4LjEuMyIsImFkciI6WyIxMC4wLjAuMTEzOjkyMDAiLCIxMC4wLjAuMjAwOjkyMDAiXSwiZmdyIjoiNGM5YzhiNmU4MTI1YjZkM2VmNGExYzU1NmM4MWM3MWIxZTA5MDY3MjRhMjE1NmNjYmQxNDcwN2Q0MmVkMjA3OCIsImtleSI6IkpDQnlkWUFCR0ZGVGZReVBYMDJ0OjJEbnlUR1FhU1dHYVMxSU1YVXNtX0EifQ==

ℹ️ Configure other nodes to join this cluster: • On this node: ⁃ Create an enrollment token with bin/elasticsearch-create-enrollment-token -s node. ⁃ Uncomment the transport.host setting at the end of config/elasticsearch.yml. ⁃ Restart Elasticsearch. • On other nodes: ⁃ Start Elasticsearch with bin/elasticsearch --enrollment-token <token>, using the enrollment token that you generated.

Elasticsearch 首次启用会创建一个用户 elastic 及密码。一个用户首次访问 Kibana 的 token。如果要开启集群需要在 Elasticsearch 配置中取消 transport.host 的注释。集群其他加点加入时，使用 bin/elasticsearch-create-enrollment-token -s node，生成新的 token。使用 bin/elasticsearch --enrollment-token <token> 加入集群。

cluster.name 集群名，自定义集群名，默认为elasticsearch，建议修改，因为低版本多播模式下同一网段下相同集群名会自动加入同一集群，如生产环境这样易造成数据运维紊乱。
node.name 节点名，同一集群下要求每个节点的节点名不一致，起到区分节点和辨认节点作用
node.master 是否为主节点，选项为true或false，当为true时在集群启动时该节点为主节点，在宕机或任务挂掉之后会选举新的主节点，恢复后该节点依然为主节点
node.data 是否处理数据，选项为true或false。负责数据的相关操作

1
2
3
4


wget https://artifacts.elastic.co/downloads/kibana/kibana-8.1.3-darwin-aarch64.tar.gz
tar zxvf kibana-8.1.3-darwin-aarch64.tar.gz
cd kibana-8.1.3
code config/kibana.yml

1
2
3


server.port: 5601
elasticsearch.hosts: ["http://localhost:9200"]
path.data: /Users/ynthm/Apps/elastic/kibana-data

1
2
3


bin/kibana
# macOS 安全原因，第一次运行需要多运行几次 需要 系统偏好设置 > 安全性与隐私 中同意2次
# http://localhost:5601 

Management > Stack Monitoring > Set up monitoring with Metricbeat

Management > Dev Tools 可以查询 ElasticSearch

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


curl --cacert config/certs/http_ca.crt -u elastic https://localhost:9200
bin/elasticsearch-reset-password -u elastic
bin/elasticsearch-create-enrollment-token -s kibana
bin/elasticsearch-keystore list
bin/elasticsearch-keystore remove <name>

bin/elasticsearch-certutil ca
bin/elasticsearch-keystore show xpack.security.transport.ssl.keystore.secure_password

bin/elasticsearch-certutil http
bin/elasticsearch-keystore add xpack.security.http.ssl.keystore.secure_password
bin/elasticsearch-keystore show xpack.security.http.ssl.keystore.secure_password
# https://www.elastic.co/guide/en/elasticsearch/reference/current/security-basic-setup-https.html

bin/elasticsearch-reset-password -u kibana_system

Security settings in Elasticsearchedit

https://www.elastic.co/guide/en/elasticsearch/reference/current/security-settings.html#http-tls-ssl-settings

https://www.elastic.co/guide/en/elasticsearch/reference/current/security-basic-setup.html#encrypt-internode-communication

https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/_encrypted_communication.html

http tls-ssl 需要设置密码，这是个坑。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


xpack.security.enabled: true

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/http.p12
xpack.security.http.ssl.keystore.secure_password: wang0804

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate 
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-stack-ca.p12

xpack.security.enrollment.enabled: true

核心概念

索引 index
类型 type
- 在es6.x：一个index 只允许有一个type
- 在es7.x。默认是_doc。使用低版本的话，也可以直接指定_doc，保持一致
  - 版本必须使用单index,单type，多type结构则会完全移除
  - 已经移除了type 这个概念
映射 mapping：相当于schema ，定义字段的类型等信息
- 只能加新字段，不能修改原字段类型
- 新建的字段不需要在初始设定好的mapping中
文档 doc：一个doc 相当于关系型数据库的一行数据，一条记录
字段 field：字段类型
- 核心数据类型
  - 字符串string
    - text：用于全文索引，分词，索引。用match query
    - keyword：不分词，只能搜索该字段完整值
  - 数值型
    - long，short， integer，double，float
  - 布尔
  - 二进制
  - 范围类型
  - date
- 复杂数据类型
  - 数组array，数组中值必须是同一数据类型
  - 对象，字段是个json
- 专用数据类型
  - ip
  - geo_point 经纬度

移除 type

index、type的初衷

之前es将index、type类比于关系型数据库（例如mysql）中database、table，这么考虑的目的是“方便管理数据之间的关系”。为什么现在要移除type？

在关系型数据库中table是独立的（独立存储），但es中同一个index中不同type是存储在同一个索引中的（lucene的索引文件），因此不同type中相同名字的字段的定义（mapping）必须一致。
不同类型的“记录”存储在同一个index中，会影响lucene的压缩性能。

Elasticsearch 5.6.0

index 可设定 index.mapping.single_type: true 让index只能有单一type，而6.0开始强制此行为

Elasticsearch 6.x

强制一个index只能有一个type，而type name可以自定义，但建议使用_doc，因为在7.0会是_doc
_uid 不再是由 _type 和 _id 合并产生，变成 _uid 是 _id 的alias(别名)
6.8开始 include_type_name 预设为true，则index建立、mapping APIs等都需要指定type name，而如果indices没有明确指定type的话，则会使用type name_doc

Elasticsearch 7.x

requests中指定types已被弃用，新增资料不再需要指定document type，eg, 指定id使用的 PUT {index}/_doc/{id} 以及自动产生id使用的POST {index}/_doc，注意的是，7.0的 _doc 是path的常驻部分，代表的是endpoint name，而不是document type
index建立、mapping APIs的参数 include_type_name 预设为false

Elasticsearch 8.x

requests中指定types不再支援
移除参数include_type_name

分页

from size

无法深度分页，当查询分页总数超过 max_result_window 默认 10000 时会报错。
实现原理和 MySQL 中的 limit 类似。需要查询出 limit+offset 数据然后过滤 offset 条最终得到数据。

search after

search_after的缺点是不能随机跳转到分页。只能逐页翻页（新数据进来的时候也可以实时查询），而且排序至少需要指定一个唯一的不重复字段（一般是_id和时间字段）
使用 search_after 时，from 值必须设置为 0 或 - 1
如果在 search_after 请求之间发生刷新，那么这些请求的结果可能会不一致，因为搜索之间发生的更改仅对最近的 point in time(PIT) 可见。

默认情况下，搜索请求针对目标索引的最新可见数据执行，这称为时间点。Elasticsearch pit (point in time) 是一个轻量级视图，可以查看数据在启动时的状态。

scroll

高效的滚动查询。第一个查询会在内存中保存一个历史快照和游标（scroll_id）来记录当前消息查询的终止位置。下次查询会基于游标进行消费（性能好，不是实时的，一般用于海量数据导出或索引重建）

目录