第三章:配置详解
深入解析 Prometheus 配置文件结构,包括 global、scrape_configs、rule_files、alerting 等核心配置项
最后更新: 2024-01-01
页面目录
第三章:配置详解
3.1 配置文件概述
Prometheus 使用 YAML 格式的配置文件,主要包含以下部分:
# 全局配置
global:
scrape_interval: 15s
evaluation_interval: 15s
# 告警管理器配置
alerting:
alertmanagers:
# 规则文件
rule_files:
# 抓取配置
scrape_configs:
3.2 Global 配置
全局配置对所有 Job 生效,可被局部配置覆盖:
global:
# 默认抓取间隔
scrape_interval: 15s
# 默认规则评估间隔
evaluation_interval: 15s
# 外部系统标签 (用于联邦集群等)
external_labels:
cluster: 'prod-us-east'
datacenter: 'dc1'
env: 'production'
# 抓取超时时间
scrape_timeout: 10s
# 规则/抓取协议
scheme: http # 或 https
# 钉钉客户端配置
钉钉_client:
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/client.crt
key_file: /etc/prometheus/certs/client.key
insecure_skip_verify: false
3.3 Scrape Configs 详解
3.3.1 静态配置
scrape_configs:
# Job 名称
- job_name: 'prometheus'
# 抓取间隔 (覆盖全局)
scrape_interval: 30s
# 抓取超时
scrape_timeout: 10s
# 抓取路径
metrics_path: /metrics
# 协议
scheme: https
# 静态目标列表
static_configs:
- targets:
- 'localhost:9090'
- 'localhost:9100'
labels:
group: 'infrastructure'
region: 'us-east'
3.3.2 文件服务发现
scrape_configs:
- job_name: 'file-sd'
# 文件路径 (支持通配符)
file_sd_configs:
- files:
- /etc/prometheus/targets/*.yml
- /etc/prometheus/targets/*.yaml
refresh_interval: 1m
# targets/server.yml
- targets:
- '192.168.1.10:9090'
- '192.168.1.11:9090'
labels:
service: api
env: prod
3.3.3 DNS 服务发现
scrape_configs:
- job_name: 'dns-sd'
dns_sd_configs:
- names:
- 'prometheus-instances.default.svc.cluster.local'
type: 'A'
port: 9090
refresh_interval: 30s
3.3.4 Kubernetes 服务发现
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
api_server: https://kubernetes.default.svc:443
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
# Relabel 配置
relabel_configs:
# 只抓取带 prometheus.io/scrape 注解的 Pod
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
支持的 Kubernetes Role:
| Role | 说明 |
|---|---|
node |
集群节点 |
service |
Kubernetes 服务 |
pod |
Kubernetes Pod |
endpoints |
服务端点 |
ingress |
入口资源 |
3.3.5 EC2 服务发现
scrape_configs:
- job_name: 'ec2'
ec2_sd_configs:
- region: us-east-1
access_key: AKIAIOSFODNN7EXAMPLE
secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
filters:
- name: tag:Environment
values:
- production
port: 9100
refresh_interval: 1m
3.4 Relabel 配置
relabel_configs 是 Prometheus 最强大的功能之一,用于动态修改标签和目标。
3.4.1 基本动作
| 动作 | 说明 |
|---|---|
replace |
替换标签值 |
keep |
保留匹配的指标 |
drop |
丢弃匹配的指标 |
labelmap |
映射标签 |
labeldrop |
删除标签 |
labelkeep |
保留标签 |
3.4.2 replace 动作
relabel_configs:
# 修改目标地址
- source_labels: [__address__]
target_label: instance
regex: '(.+):\d+'
replacement: '${1}'
action: replace
# 组合多个标签
- source_labels: [env, region]
separator: '-'
target_label: environment
replacement: '${1}-${2}'
action: replace
3.4.3 keep 和 drop 动作
relabel_configs:
# 只保留有特定标签的指标
- source_labels: [__meta_kubernetes_service_name]
regex: 'nginx-ingress-controller'
action: keep
# 丢弃测试环境的指标
- source_labels: [env]
regex: 'test'
action: drop
3.4.4 labelmap 动作
relabel_configs:
# 将 K8s 标签映射为 Prometheus 标签
- source_labels: [__meta_kubernetes_pod_label_(.+)]
regex: '(.+)'
action: labelmap
replacement: '${1}'
3.4.5 hashmod 动作
relabel_configs:
# 基于 instance 标签进行哈希分区
- source_labels: [instance]
target_label: __tmp_hash
modulus: 4
action: hashmod
- source_labels: [__tmp_hash]
regex: '0'
action: keep
3.5 Alerting 配置
alerting:
alertmanagers:
# 静态配置
- static_configs:
- targets:
- alertmanager:9093
# Kubernetes 服务发现
- kubernetes_sd_configs:
- role: pod
namespaces:
names:
- monitoring
# Relabel 选择 Alertmanager Pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: alertmanager
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex: "9093"
action: keep
3.6 Rule Files
rule_files:
# 精确路径
- "/etc/prometheus/rules/*.yml"
# 带通配符
- "rules/**/*.yml"
# 可选规则文件 (不存在不报错)
- "rules/optional/*.yml"
3.7 完整配置示例
3.7.1 生产环境配置
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'prod'
env: 'production'
datacenter: 'us-east-1'
scrape_timeout: 10s
tls_config:
insecure_skip_verify: false
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager.monitoring.svc:9093
rule_files:
- /etc/prometheus/rules/*.yml
- /etc/prometheus/alerts/*.yml
scrape_configs:
# Prometheus 自身监控
- job_name: 'prometheus'
scrape_interval: 15s
static_configs:
- targets: ['localhost:9090']
labels:
service: prometheus
# Kubernetes API Server
- job_name: 'kubernetes-apiserver'
kubernetes_sd_configs:
- role: endpoints
api_server: https://kubernetes.default.svc:443
namespaces:
names:
- default
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: keep
regex: default;kubernetes
# Kubernetes Nodes
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: instance
replacement: '${1}'
# Kubernetes Pods
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: 'true'
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# 常见服务 Exporter
- job_name: 'node-exporter'
static_configs:
- targets:
- 'node-exporter:9100'
labels:
service: node
- job_name: 'mysqld-exporter'
static_configs:
- targets:
- 'mysqld-exporter:9104'
labels:
service: mysql
- job_name: 'redis-exporter'
static_configs:
- targets:
- 'redis-exporter:9121'
labels:
service: redis
3.7.2 多环境配置
# 使用 --config.envsubst-templates 或在启动时传入
# config.template.yml
global:
scrape_interval: ${SCRAPE_INTERVAL:15s}
external_labels:
cluster: '${CLUSTER_NAME}'
env: '${ENVIRONMENT}'
3.8 配置验证
3.8.1 语法检查
# 检查配置文件语法
./promtool check config prometheus.yml
# 检查规则文件语法
./promtool check rules /etc/prometheus/rules/*.yml
3.8.2 格式化配置
# 格式化 YAML
./promtool config format prometheus.yml > prometheus.formatted.yml
3.8.3 测试配置
# 加载测试
curl -X POST http://localhost:9090/-/reload
3.9 本章小结
本章深入介绍了 Prometheus 的配置系统:
- Global 配置 - 全局参数和默认值
- Scrape 配置 - 多种服务发现方式
- Relabel 配置 - 强大的标签操作能力
- Alerting 配置 - Alertmanager 集成
- Rule Files - 规则文件管理
- 配置验证 - 语法检查和测试
📖 下一步
- 第四章:PromQL 查询语言 - 掌握 PromQL 查询
- 第五章:Exporter 配置 - 学习如何暴露指标