第五章:文档操作
学习 Elasticsearch 的文档 CRUD 操作、批量操作、版本控制和过期文档管理。
最后更新: 2024-01-15
页面目录
第五章:文档操作
5.1 文档基本操作
5.1.1 创建文档
# 自动生成 ID
POST /products/_doc
{
"name": "iPhone 15",
"price": 7999.00,
"category": "electronics",
"tags": ["手机", "苹果", "5G"]
}
# 响应
{
"_index": "products",
"_id": "abc123",
"_version": 1,
"result": "created",
"_shards": { "total": 2, "successful": 1, "failed": 0 }
}
5.1.2 指定 ID 创建
PUT /products/_doc/1
{
"name": "iPhone 15 Pro",
"price": 9999.00,
"category": "electronics",
"stock": 100,
"created_at": "2024-01-15T10:30:00Z"
}
5.1.3 读取文档
# 基本读取
GET /products/_doc/1
# 响应
{
"_index": "products",
"_id": "1",
"_version": 3,
"_seq_no": 15,
"_primary_term": 1,
"found": true,
"_source": {
"name": "iPhone 15 Pro",
"price": 9999.00,
...
}
}
# 只获取 source
GET /products/_source/1
# 检查文档是否存在
HEAD /products/_doc/1
5.1.4 更新文档
# 全量替换
PUT /products/_doc/1
{
"name": "iPhone 15 Pro Max",
"price": 10999.00,
"category": "electronics"
}
# 部分更新
POST /products/_update/1
{
"doc": {
"price": 9999.00,
"stock": 50
}
}
# 使用脚本更新
POST /products/_update/1
{
"script": {
"source": "ctx._source.stock -= params.count",
"params": {
"count": 5
}
}
}
# 增加字段
POST /products/_update/1
{
"script": {
"source": "ctx._source.tags.add(params.tag)",
"params": {
"tag": "热卖"
}
}
}
# 删除字段
POST /products/_update/1
{
"script": {
"source": "ctx._source.remove('deleted')"
}
}
5.1.5 删除文档
# 删除单个文档
DELETE /products/_doc/1
# 按条件删除
POST /products/_delete_by_query
{
"query": {
"term": {
"category": "discontinued"
}
}
}
5.2 批量操作
5.2.1 Bulk API 格式
POST /_bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product 1", "price": 100 }
{ "create": { "_index": "products", "_id": "2" } }
{ "name": "Product 2", "price": 200 }
{ "update": { "_index": "products", "_id": "1" } }
{ "doc": { "price": 150 } }
{ "delete": { "_index": "products", "_id": "3" } }
5.2.2 批量导入示例
POST /products/_bulk
{ "index": {} }
{ "name": "Laptop", "price": 5999, "category": "electronics" }
{ "index": {} }
{ "name": "Headphones", "price": 299, "category": "electronics" }
{ "index": {} }
{ "name": "Keyboard", "price": 199, "category": "electronics" }
{ "index": {} }
{ "name": "Mouse", "price": 99, "category": "electronics" }
5.2.3 批量跨索引操作
POST /_bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Product A" }
{ "index": { "_index": "orders", "_id": "1" } }
{ "product_id": "1", "quantity": 2 }
5.2.4 错误处理
# 继续执行,即使有错误
POST /products/_bulk?continue_on_failure=true
{ "index": {} }
{ "name": "P1" }
{ "index": {} }
{ "invalid_field": "will cause error" }
{ "index": {} }
{ "name": "P2" }
# 查看详细错误
POST /products/_bulk?filter_path=errors,items.*.error
5.3 多文档操作
5.3.1 MGet - 批量读取
GET /_mget
{
"docs": [
{ "_index": "products", "_id": "1" },
{ "_index": "products", "_id": "2" },
{ "_index": "products", "_id": "3", "_source": ["name", "price"] }
]
}
# 简化写法
GET /products/_mget
{
"ids": ["1", "2", "3"]
}
5.3.2 MSearch - 批量搜索
POST /_msearch
{"index": "products"}
{"query": {"match": {"name": "iPhone"}}, "from": 0, "size": 10}
{"index": "products"}
{"query": {"term": {"category": "electronics"}}, "from": 0, "size": 5}
5.3.3 Bulk 优化建议
# 控制批量大小
POST /products/_bulk?batch_size=10000
# 设置刷新策略
POST /products/_bulk?refresh=false
POST /products/_bulk?refresh=true
POST /products/_bulk?refresh=wait_for
5.4 乐观并发控制
5.4.1 使用版本号
# 指定版本更新(版本匹配才更新)
PUT /products/_doc/1?if_seq_no=10&if_primary_term=1
{
"name": "Updated Name",
"price": 7999
}
# 响应 - 版本冲突
{
"error": {
"type": "version_conflict_engine_exception",
...
}
}
5.4.2 使用外部版本
# 使用外部系统版本号
PUT /products/_doc/1?version=12345&version_type=external
{
"name": "Updated Name"
}
5.4.3 retry_on_conflict
# 失败自动重试
POST /products/_update/1?retry_on_conflict=3
{
"doc": {
"price": 7999
}
}
5.5 文档路由
5.5.1 路由原理
文档 ID ──Hash──► [0, num_shards-1] ──► 分片编号
5.5.2 自定义路由
# 创建时指定路由
PUT /products/_doc/1?routing=user-123
{
"name": "Product",
"user_id": "user-123"
}
# 查询时指定路由
GET /products/_doc/1?routing=user-123
# 搜索时指定路由
GET /products/_search?routing=user-123
{
"query": {
"term": { "user_id": "user-123" }
}
}
5.5.3 路由策略
{
"settings": {
"index.routing_partition.size": 2
}
}
5.6 文档过期管理
5.6.1 TTL 字段(已废弃)
# 不推荐使用,使用 ILM 替代
PUT /logs
{
"mappings": {
"properties": {
"message": { "type": "text" },
"@timestamp": { "type": "date" }
}
},
"settings": {
"index.lifecycle.name": "logs-policy"
}
}
5.6.2 索引生命周期管理(ILM)
# 创建生命周期策略
PUT /_ilm/policy/logs-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_primary_shard_size": "50gb",
"max_age": "7d"
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"shrink": { "number_of_shards": 1 },
"forcemerge": { "max_num_segments": 1 }
}
},
"cold": {
"min_age": "60d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
5.7 常用命令汇总
| 操作 | 命令 |
|---|---|
| 创建文档 | POST /index/_doc 或 PUT /index/_doc/{id} |
| 读取文档 | GET /index/_doc/{id} |
| 更新文档 | POST /index/_update/{id} |
| 删除文档 | DELETE /index/_doc/{id} |
| 批量创建 | POST /_bulk |
| 批量读取 | GET /_mget |
| 条件删除 | POST /index/_delete_by_query |
5.8 性能优化建议
- 批量写入:使用 Bulk API,减少网络开销
- 禁用刷新:批量导入时设置
refresh_interval: -1 - 合理分片:避免过多小分片
- 使用路由:对于有明确路由键的查询
- 控制副本:导入完成后开启副本
# 导入优化示例
PUT /products/_settings
{
"index": {
"refresh_interval": "-1",
"number_of_replicas": 0
}
}
# ... 执行批量导入 ...
PUT /products/_settings
{
"index": {
"refresh_interval": "1s",
"number_of_replicas": 1
}
}
5.9 总结
本章介绍了 Elasticsearch 文档的 CRUD 操作、批量操作、版本控制和过期文档管理。熟练掌握这些操作是进行数据操作的基础。下一章将学习强大的查询 DSL。