第三章:配置详解

深入解析 Prometheus 配置文件结构,包括 global、scrape_configs、rule_files、alerting 等核心配置项

最后更新: 2024-01-01
页面目录

第三章:配置详解

3.1 配置文件概述

Prometheus 使用 YAML 格式的配置文件,主要包含以下部分:

# 全局配置
global:
  scrape_interval: 15s
  evaluation_interval: 15s

# 告警管理器配置
alerting:
  alertmanagers:

# 规则文件
rule_files:

# 抓取配置
scrape_configs:

3.2 Global 配置

全局配置对所有 Job 生效,可被局部配置覆盖:

global:
  # 默认抓取间隔
  scrape_interval: 15s

  # 默认规则评估间隔
  evaluation_interval: 15s

  # 外部系统标签 (用于联邦集群等)
  external_labels:
    cluster: 'prod-us-east'
    datacenter: 'dc1'
    env: 'production'

  # 抓取超时时间
  scrape_timeout: 10s

  # 规则/抓取协议
  scheme: http  # 或 https

  # 钉钉客户端配置
 钉钉_client:
    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
      cert_file: /etc/prometheus/certs/client.crt
      key_file: /etc/prometheus/certs/client.key
      insecure_skip_verify: false

3.3 Scrape Configs 详解

3.3.1 静态配置

scrape_configs:
  # Job 名称
  - job_name: 'prometheus'

    # 抓取间隔 (覆盖全局)
    scrape_interval: 30s

    # 抓取超时
    scrape_timeout: 10s

    # 抓取路径
    metrics_path: /metrics

    # 协议
    scheme: https

    # 静态目标列表
    static_configs:
      - targets:
          - 'localhost:9090'
          - 'localhost:9100'
        labels:
          group: 'infrastructure'
          region: 'us-east'

3.3.2 文件服务发现

scrape_configs:
  - job_name: 'file-sd'
    # 文件路径 (支持通配符)
    file_sd_configs:
      - files:
          - /etc/prometheus/targets/*.yml
          - /etc/prometheus/targets/*.yaml
        refresh_interval: 1m
# targets/server.yml
- targets:
    - '192.168.1.10:9090'
    - '192.168.1.11:9090'
  labels:
    service: api
    env: prod

3.3.3 DNS 服务发现

scrape_configs:
  - job_name: 'dns-sd'
    dns_sd_configs:
      - names:
          - 'prometheus-instances.default.svc.cluster.local'
        type: 'A'
        port: 9090
        refresh_interval: 30s

3.3.4 Kubernetes 服务发现

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        api_server: https://kubernetes.default.svc:443
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    # Relabel 配置
    relabel_configs:
      # 只抓取带 prometheus.io/scrape 注解的 Pod
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__

      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)

支持的 Kubernetes Role:

Role 说明
node 集群节点
service Kubernetes 服务
pod Kubernetes Pod
endpoints 服务端点
ingress 入口资源

3.3.5 EC2 服务发现

scrape_configs:
  - job_name: 'ec2'
    ec2_sd_configs:
      - region: us-east-1
        access_key: AKIAIOSFODNN7EXAMPLE
        secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
        filters:
          - name: tag:Environment
            values:
              - production
        port: 9100
        refresh_interval: 1m

3.4 Relabel 配置

relabel_configs 是 Prometheus 最强大的功能之一,用于动态修改标签和目标。

3.4.1 基本动作

动作 说明
replace 替换标签值
keep 保留匹配的指标
drop 丢弃匹配的指标
labelmap 映射标签
labeldrop 删除标签
labelkeep 保留标签

3.4.2 replace 动作

relabel_configs:
  # 修改目标地址
  - source_labels: [__address__]
    target_label: instance
    regex: '(.+):\d+'
    replacement: '${1}'
    action: replace

  # 组合多个标签
  - source_labels: [env, region]
    separator: '-'
    target_label: environment
    replacement: '${1}-${2}'
    action: replace

3.4.3 keep 和 drop 动作

relabel_configs:
  # 只保留有特定标签的指标
  - source_labels: [__meta_kubernetes_service_name]
    regex: 'nginx-ingress-controller'
    action: keep

  # 丢弃测试环境的指标
  - source_labels: [env]
    regex: 'test'
    action: drop

3.4.4 labelmap 动作

relabel_configs:
  # 将 K8s 标签映射为 Prometheus 标签
  - source_labels: [__meta_kubernetes_pod_label_(.+)]
    regex: '(.+)'
    action: labelmap
    replacement: '${1}'

3.4.5 hashmod 动作

relabel_configs:
  # 基于 instance 标签进行哈希分区
  - source_labels: [instance]
    target_label: __tmp_hash
    modulus: 4
    action: hashmod

  - source_labels: [__tmp_hash]
    regex: '0'
    action: keep

3.5 Alerting 配置

alerting:
  alertmanagers:
    # 静态配置
    - static_configs:
        - targets:
            - alertmanager:9093

    # Kubernetes 服务发现
    - kubernetes_sd_configs:
        - role: pod
          namespaces:
            names:
              - monitoring
          # Relabel 选择 Alertmanager Pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_label_app]
              regex: alertmanager
              action: keep
            - source_labels: [__meta_kubernetes_pod_container_port_number]
              regex: "9093"
              action: keep

3.6 Rule Files

rule_files:
  # 精确路径
  - "/etc/prometheus/rules/*.yml"

  # 带通配符
  - "rules/**/*.yml"

  # 可选规则文件 (不存在不报错)
  - "rules/optional/*.yml"

3.7 完整配置示例

3.7.1 生产环境配置

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'prod'
    env: 'production'
    datacenter: 'us-east-1'
  scrape_timeout: 10s
  tls_config:
    insecure_skip_verify: false

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager.monitoring.svc:9093

rule_files:
  - /etc/prometheus/rules/*.yml
  - /etc/prometheus/alerts/*.yml

scrape_configs:
  # Prometheus 自身监控
  - job_name: 'prometheus'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:9090']
        labels:
          service: prometheus

  # Kubernetes API Server
  - job_name: 'kubernetes-apiserver'
    kubernetes_sd_configs:
      - role: endpoints
        api_server: https://kubernetes.default.svc:443
        namespaces:
          names:
            - default
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
        action: keep
        regex: default;kubernetes

  # Kubernetes Nodes
  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: instance
        replacement: '${1}'

  # Kubernetes Pods
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: 'true'
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

  # 常见服务 Exporter
  - job_name: 'node-exporter'
    static_configs:
      - targets:
          - 'node-exporter:9100'
        labels:
          service: node

  - job_name: 'mysqld-exporter'
    static_configs:
      - targets:
          - 'mysqld-exporter:9104'
        labels:
          service: mysql

  - job_name: 'redis-exporter'
    static_configs:
      - targets:
          - 'redis-exporter:9121'
        labels:
          service: redis

3.7.2 多环境配置

# 使用 --config.envsubst-templates 或在启动时传入
# config.template.yml
global:
  scrape_interval: ${SCRAPE_INTERVAL:15s}
  external_labels:
    cluster: '${CLUSTER_NAME}'
    env: '${ENVIRONMENT}'

3.8 配置验证

3.8.1 语法检查

# 检查配置文件语法
./promtool check config prometheus.yml

# 检查规则文件语法
./promtool check rules /etc/prometheus/rules/*.yml

3.8.2 格式化配置

# 格式化 YAML
./promtool config format prometheus.yml > prometheus.formatted.yml

3.8.3 测试配置

# 加载测试
curl -X POST http://localhost:9090/-/reload

3.9 本章小结

本章深入介绍了 Prometheus 的配置系统:

  1. Global 配置 - 全局参数和默认值
  2. Scrape 配置 - 多种服务发现方式
  3. Relabel 配置 - 强大的标签操作能力
  4. Alerting 配置 - Alertmanager 集成
  5. Rule Files - 规则文件管理
  6. 配置验证 - 语法检查和测试

📖 下一步