第十三章:备份恢复

学习 Elasticsearch 快照备份与恢复,包括仓库配置、快照创建、恢复操作和数据迁移。

最后更新: 2024-01-15
页面目录

第十三章:备份恢复

13.1 备份概述

13.1.1 备份方式

方式 说明 适用场景
快照 API Elasticsearch 内置 推荐方式
文件系统备份 复制数据目录 离线备份
云快照 云存储集成 云部署

13.1.2 备份架构

┌─────────────────────────────────────────────────────────┐
│              Elasticsearch Cluster                        │
│  ┌─────────────────────────────────────────────────┐    │
│  │  Indices                                         │    │
│  │  ┌───────┐ ┌───────┐ ┌───────┐                 │    │
│  │  │ Shard │ │ Shard │ │ Shard │                 │    │
│  │  └───────┘ └───────┘ └───────┘                 │    │
│  └─────────────────────────────────────────────────┘    │
│                         │ Snapshot                       │
└─────────────────────────┼────────────────────────────────┘
┌─────────────────────────▼────────────────────────────────┐
│                   Snapshot Repository                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │ File System │  │   S3/MinIO  │  │    HDFS     │      │
│  └─────────────┘  └─────────────┘  └─────────────┘      │
└──────────────────────────────────────────────────────────┘

13.2 仓库配置

13.2.1 创建仓库

# 文件系统仓库
PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "location": "/backup/elasticsearch",
    "compress": true,
    "max_restore_bytes_per_sec": "100mb",
    "max_snapshot_bytes_per_sec": "100mb",
    "chunk_size": "1gb"
  }
}

# 验证仓库
POST /_snapshot/my_backup/_verify

13.2.2 S3 仓库

# 安装 S3 插件
./bin/elasticsearch-plugin install repository-s3

# 配置 S3 仓库
PUT /_snapshot/s3_backup
{
  "type": "s3",
  "settings": {
    "bucket": "my-es-backups",
    "region": "us-east-1",
    "base_path": "backups",
    "compress": true,
    "storage_class": "standard"
  }
}

13.2.3 HDFS 仓库

# 安装 HDFS 插件
./bin/elasticsearch-plugin install repository-hdfs

# 配置 HDFS 仓库
PUT /_snapshot/hdfs_backup
{
  "type": "hdfs",
  "settings": {
    "uri": "hdfs://namenode:8020",
    "path": "/user/elasticsearch/backups",
    "conf_location": "/etc/hadoop/core-site.xml",
    "compress": true
  }
}

13.3 快照操作

13.3.1 创建快照

# 快照所有索引
PUT /_snapshot/my_backup/snapshot_1

# 快照指定索引
PUT /_snapshot/my_backup/snapshot_2
{
  "indices": ["index1", "index2"],
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "admin",
    "taken_date": "2024-01-15",
    "description": "Daily backup"
  }
}

# 异步创建
PUT /_snapshot/my_backup/snapshot_3?wait_for_completion=false
# 返回
{
  "snapshot": "snapshot_3",
  "uuid": "abc123",
  "state": "STARTED"
}

13.3.2 查看快照

# 列出所有快照
GET /_snapshot/my_backup/_all

# 查看特定快照
GET /_snapshot/my_backup/snapshot_1

# 响应
{
  "snapshots": [
    {
      "snapshot": "snapshot_1",
      "uuid": "abc123",
      "state": "SUCCESS",
      "start_time": "2024-01-15T10:00:00Z",
      "end_time": "2024-01-15T10:15:00Z",
      "duration_in_millis": 900000,
      "indices": ["products", "orders"],
      "total_shards": 10,
      "successful_shards": 10,
      "failed_shards": 0,
      "version": "8.12.0"
    }
  ]
}

13.3.3 快照状态

# 查看正在运行的快照
GET /_snapshot/_current

# 详细状态
GET /_snapshot/my_backup/snapshot_1/_status

13.4 恢复操作

13.4.1 基本恢复

# 恢复所有索引
POST /_snapshot/my_backup/snapshot_1/_restore

# 恢复指定索引
POST /_snapshot/my_backup/snapshot_2/_restore
{
  "indices": ["index1"],
  "rename_pattern": "index(.+)",
  "rename_replacement": "restored_index_$1"
}

# 查看恢复进度
GET /_cat/recovery?v

13.4.2 恢复选项

# 带选项的恢复
POST /_snapshot/my_backup/snapshot_1/_restore
{
  "indices": ["products"],
  "index_settings": {
    "index.number_of_replicas": 0,
    "index.refresh_interval": "-1"
  },
  "ignore_index_settings": [
    "index.mapper.dynamic"
  ],
  "include_aliases": false
}

13.4.3 部分恢复

# 只恢复部分分片
POST /_snapshot/my_backup/snapshot_1/_restore
{
  "indices": ["products"],
  "partial": true
}

13.5 数据迁移

13.5.1 跨集群恢复

# 配置远程仓库(源集群)
PUT /_cluster/settings
{
  "persistent": {
    "repositories.url.allowed_urls": [
      "http://source-cluster:9200/_snapshot/*"
    ]
  }
}

# 创建远程仓库引用(目标集群)
PUT /_snapshot/remote_backup
{
  "type": "url",
  "settings": {
    "url": "http://source-cluster:9200/_snapshot/my_backup"
  }
}

# 从远程恢复
POST /_snapshot/remote_backup/snapshot_1/_restore

13.5.2 Reindex 迁移

# 跨集群 Reindex
POST /_reindex
{
  "source": {
    "remote": {
      "host": "http://source-cluster:9200",
      "username": "elastic",
      "password": "password"
    },
    "index": "source_index",
    "size": 10000,
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "dest_index"
  }
}

# 带变换的 Reindex
POST /_reindex
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  },
  "script": {
    "source": "ctx._source.category = ctx._source.category.toUpperCase()",
    "lang": "painless"
  }
}

13.5.3 批量 Reindex

# 并行 Reindex
POST /_reindex
{
  "source": {
    "remote": {
      "host": "http://source-cluster:9200"
    },
    "index": "large_index",
    "size": 5000
  },
  "dest": {
    "index": "new_index"
  },
  "script": {
    "source": """
      ctx._source.timestamp = ctx._source['@timestamp'];
      ctx._source.remove('@timestamp');
    """,
    "lang": "painless"
  },
  "conflicts": "proceed"
}

13.6 自动备份

13.6.1 快照生命周期管理

# 创建 SLM 策略
PUT /_slm/policy/daily-snapshot
{
  "schedule": "0 2 * * *",
  "name": "daily-snapshot-{now/d}",
  "repository": "my_backup",
  "config": {
    "indices": ["*"],
    "ignore_unavailable": true,
    "include_global_state": false
  },
  "retention": {
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 50
  }
}

# 查看 SLM 状态
GET /_slm/stats

# 手动执行 SLM
POST /_slm/policy/daily-snapshot/_execute

13.6.2 自动备份脚本

#!/bin/bash
# backup.sh

REPO="/backup/elasticsearch"
RETENTION_DAYS=30

# 创建快照
SNAPSHOT_NAME="backup-$(date +%Y%m%d-%H%M%S)"

curl -X PUT "localhost:9200/_snapshot/my_backup/${SNAPSHOT_NAME}" \
  -u elastic:password \
  -H 'Content-Type: application/json' \
  -d '{
    "indices": ["*"],
    "ignore_unavailable": true,
    "include_global_state": false
  }'

# 清理过期快照
EXPIRED=$(curl -s -u elastic:password "localhost:9200/_snapshot/my_backup/_all" | \
  jq -r '.snapshots[] | select(.end_time < "'$(date -d "-${RETENTION_DAYS} days" -I)'") | .snapshot')

for snapshot in $EXPIRED; do
  echo "Deleting expired snapshot: $snapshot"
  curl -X DELETE "localhost:9200/_snapshot/my_backup/${snapshot}" \
    -u elastic:password
done

echo "Backup completed: $SNAPSHOT_NAME"

13.7 恢复验证

13.7.1 验证恢复

# 检查恢复的索引
GET /_cat/indices/restored_*

# 验证文档数量
GET /restored_index/_count

# 抽样验证数据
GET /restored_index/_search
{
  "size": 10,
  "query": {
    "match_all": {}
  }
}

13.7.2 数据对比

# 对比源索引和恢复索引
GET /source_index/_count
GET /restored_index/_count

# 检查特定文档
GET /source_index/_doc/123
GET /restored_index/_doc/123

13.8 常见问题

13.8.1 恢复失败处理

问题 原因 解决方案
分片未分配 磁盘空间不足 清理磁盘或扩容
索引已存在 同名索引 使用 rename 或删除旧索引
仓库不可用 网络问题 检查仓库配置
版本不兼容 ES 版本差异 升级 ES 版本

13.8.2 仓库锁定

# 清理锁文件
rm -f /backup/elasticsearch/*.lock

# 清理损坏的仓库
DELETE /_snapshot/corrupted_backup
# 重新创建仓库
PUT /_snapshot/backup

13.9 最佳实践

13.9.1 备份策略

□ 每日快照,保留 30 天
□ 每周完整备份,保留 90 天
□ 跨区域/跨集群复制关键数据
□ 定期测试恢复流程
□ 监控快照大小和保留情况
□ 备份配置和映射定义

13.9.2 恢复计划

# 恢复检查清单
1. 确认快照状态为 SUCCESS
2. 检查目标集群磁盘空间
3. 确认索引名称不冲突
4. 验证数据完整性
5. 更新别名指向新索引
6. 清理临时恢复索引

13.10 总结

本章介绍了 Elasticsearch 的备份恢复功能,包括快照仓库配置、快照创建与恢复、数据迁移等。完善的备份策略是保障数据安全的重要措施。