第二章:安装与部署

详细介绍 Prometheus 的多种安装方式,包括二进制部署、Docker 部署、Kubernetes 部署以及配置管理

最后更新: 2024-01-01
页面目录

第二章:安装与部署

2.1 环境要求

2.1.1 硬件要求

Prometheus 对硬件要求相对较低,但取决于监控规模:

监控规模 CPU 内存 磁盘
小型 (< 100 targets) 2 核 4 GB 50 GB SSD
中型 (100-500 targets) 4 核 8 GB 100 GB SSD
大型 (500-2000 targets) 8 核 16 GB 200 GB SSD
超大型 (> 2000 targets) 16+ 核 32+ GB 500+ GB SSD

2.1.2 软件要求

  • 操作系统: Linux (推荐)、macOS、Windows
  • 依赖: Go 1.19+ (源码编译时)
  • 网络: 端口 9090 (Web UI)、9090-9100 (metrics)

2.2 二进制安装

2.2.1 下载 Prometheus

# 选择合适的版本
PROMETHEUS_VERSION="2.47.0"
ARCH="linux-amd64"

# 下载
wget https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.${ARCH}.tar.gz

# 解压
tar xzf prometheus-${PROMETHEUS_VERSION}.${ARCH}.tar.gz
cd prometheus-${PROMETHEUS_VERSION}.${ARCH}

2.2.2 目录结构

prometheus/
├── prometheus              # 主程序
├── promtool               # 工具程序
├── consoles/              # 控制台模板
├── console_libraries/     # 控制台库
├── LICENSE
├── NOTICE
├── prometheus.yml         # 配置文件
└── data/                  # 数据目录 (运行时创建)

2.2.3 启动 Prometheus

# 基本启动
./prometheus --config.file=prometheus.yml

# 指定数据目录和端口
./prometheus \
  --config.file=prometheus.yml \
  --storage.tsdb.path=/data/prometheus \
  --web.listen-address=:9090 \
  --web.enable-lifecycle

# 开机自启 (systemd)
sudo tee /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus Monitoring System
After=network-online.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/data
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

# 启动服务
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus

2.2.4 验证安装

# 检查进程
ps aux | grep prometheus

# 检查端口
ss -tlnp | grep 9090

# 测试 API
curl http://localhost:9090/api/v1/status/runtimeinfo

2.3 Docker 安装

2.3.1 基础运行

# 创建数据目录
mkdir -p /data/prometheus

# 运行容器
docker run -d \
  --name prometheus \
  -p 9090:9090 \
  -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \
  -v /data/prometheus:/prometheus \
  prom/prometheus:latest \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/prometheus

2.3.2 Docker Compose 部署

# docker-compose.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.enable-lifecycle'
    restart: unless-stopped
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge
# 启动
docker-compose up -d

# 查看日志
docker-compose logs -f prometheus

# 停止
docker-compose down

2.3.3 生产级配置

# docker-compose.prod.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.47.0
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./rules:/etc/prometheus/rules:ro
      - ./alerts:/etc/prometheus/alerts:ro
      - prometheus_data:/prometheus
    environment:
      - TZ=Asia/Shanghai
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--storage.tsdb.wal-compression'
      - '--web.enable-lifecycle'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:9090/-/healthy"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  prometheus_data:

2.4 Kubernetes 部署

2.4.1 使用 Helm 部署

# 添加 Helm 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# 创建命名空间
kubectl create namespace monitoring

# 安装 Prometheus
helm install prometheus prometheus-community/prometheus \
  --namespace monitoring \
  --set server.persistentVolume.enabled=true \
  --set server.persistentVolume.size=50Gi \
  --set alertmanager.enabled=true \
  --set nodeExporter.enabled=true

2.4.2 自定义 values.yaml

# values.yaml
server:
  replicaCount: 2
  persistentVolume:
    enabled: true
    size: 100Gi
    storageClass: "ssd"
  retention: "30d"
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      cpu: 2
      memory: 4Gi

alertmanager:
  enabled: true
  replicaCount: 2
  persistentVolume:
    enabled: true
    size: 10Gi

nodeExporter:
  enabled: true

kubeStateMetrics:
  enabled: true

prometheus-node-exporter:
  enabled: true

pushgateway:
  enabled: true

2.4.3 Kustomize 部署

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: monitoring

resources:
  - namespace.yaml
  - prometheus-deployment.yaml
  - prometheus-service.yaml
  - prometheus-configmap.yaml
  - prometheus-rbac.yaml

commonLabels:
  app: prometheus
  environment: production

2.5 配置管理

2.5.1 配置文件结构

# prometheus.yml
global:
  # 全局配置
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'prod-us-east'
    env: 'production'

alerting:
  # Alertmanager 配置
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  # 告警规则文件
  - "rules/*.yml"
  - "alerts/*.yml"

scrape_configs:
  # 监控目标配置
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

2.5.2 环境变量配置

# 使用环境变量
global:
  scrape_interval: ${SCRAPE_INTERVAL:15s}
  external_labels:
    environment: ${ENVIRONMENT:production}

2.5.3 配置热加载

启用 --web.enable-lifecycle 后:

# 重新加载配置
curl -X POST http://localhost:9090/-/reload

# 或者使用 promtool
./promtool check config prometheus.yml
./promtool reload --url http://localhost:9090

2.6 升级与迁移

2.6.1 版本升级步骤

# 1. 备份数据
cp -r /data/prometheus /data/prometheus.bak

# 2. 下载新版本
PROMETHEUS_VERSION="2.48.0"
wget https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz

# 3. 替换二进制
tar xzf prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz
sudo systemctl stop prometheus
sudo mv /usr/local/prometheus /usr/local/prometheus.old
sudo mv prometheus-${PROMETHEUS_VERSION}.linux-amd64 /usr/local/prometheus
sudo systemctl start prometheus

# 4. 验证运行
curl http://localhost:9090/api/v1/status/runtimeinfo

2.6.2 数据迁移

# 使用 promtool 修复数据
./promtool tsdb dump /data/prometheus > backup.db

# 检查数据一致性
./promtool tsdb analyze /data/prometheus

2.7 本章小结

本章介绍了 Prometheus 的多种安装和部署方式:

  1. 二进制安装 - 适合熟悉 Linux 系统的用户
  2. Docker 部署 - 容器化部署,便于环境一致
  3. Kubernetes 部署 - 云原生环境最佳选择
  4. 配置管理 - 支持热加载和环境变量
  5. 升级迁移 - 完整的版本升级流程

📖 下一步