第九章:Grafana 集成
详细介绍 Prometheus 与 Grafana 的集成配置,包括数据源配置、Dashboard 使用、告警配置等
最后更新: 2024-01-01
页面目录
第九章:Grafana 集成
9.1 Grafana 概述
Grafana 是 Prometheus 官方推荐的可视化工具,提供丰富的图表类型和强大的仪表盘功能。
9.1.1 安装
# Docker 安装
docker run -d \
--name=grafana \
-p 3000:3000 \
-v grafana_data:/var/lib/grafana \
grafana/grafana:latest
# Helm 安装
helm install grafana prometheus-community/grafana \
--namespace monitoring \
--set adminPassword='your_password' \
--set persistence.enabled=true
9.1.2 默认凭据
- 用户名: admin
- 密码: admin (首次登录需修改)
9.2 数据源配置
9.2.1 添加 Prometheus 数据源
通过 Web UI:
- 进入 Configuration → Data Sources
- 点击 Add data source
- 选择 Prometheus
- 填写配置:
# HTTP 配置
URL: http://prometheus:9090
Access: Server (default) # 推荐
# Auth 配置
Basic auth: ✓
User: admin
Password: your_password
# Prometheus 设置
Scrape interval: 15s
Query timeout: 60s
HTTP Method: POST
9.2.2 使用配置文件
# /etc/grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
jsonData:
httpMethod: POST
timeInterval: 15s
queryTimeout: 60s
9.3 Dashboard 配置
9.3.1 自动导入 Dashboard
# /etc/grafana/provisioning/dashboards/prometheus.yml
apiVersion: 1
providers:
- name: 'Prometheus'
orgId: 1
folder: 'Prometheus'
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards
9.3.2 Dashboard JSON 格式
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {"mode": "palette-classic"},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {"tooltip": false, "viz": false, "legend": false},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {"type": "linear"},
"showPoints": "never",
"spanNulls": true
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "red", "value": 80}
]
},
"unit": "percent"
}
},
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
"id": 1,
"options": {
"legend": {"displayMode": "list", "placement": "bottom"},
"tooltip": {"mode": "single"}
},
"pluginVersion": "8.0.0",
"targets": [
{
"expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"legendFormat": "{{ instance }}",
"refId": "A"
}
],
"title": "CPU 使用率",
"type": "timeseries"
}
],
"refresh": "5s",
"schemaVersion": 27,
"style": "dark",
"tags": ["node", "system"],
"templating": {
"list": [
{
"allValue": ".*",
"current": {"selected": true, "text": "All", "value": "$__all"},
"datasource": "Prometheus",
"definition": "label_values(node_cpu_seconds_total, instance)",
"description": null,
"hide": 0,
"includeAll": true,
"label": "实例",
"multi": true,
"name": "instance",
"options": [],
"query": "label_values(node_cpu_seconds_total, instance)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
}
]
},
"time": {"from": "now-6h", "to": "now"},
"timepicker": {},
"timezone": "browser",
"title": "Node Exporter Dashboard",
"uid": "node-exporter",
"version": 1
}
9.4 常用图表类型
9.4.1 时序图 (Time Series)
# 单实例 CPU
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# 多实例网络流量
rate(node_network_receive_bytes_total[5m])
rate(node_network_transmit_bytes_total[5m])
9.4.2 统计图 (Stat)
# 当前值显示
up{job="prometheus"}
count(up)
sum(http_requests_total)
9.4.3 仪表盘 (Gauge)
# 内存使用率
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes * 100
9.4.4 表格 (Table)
# 服务状态
up{job=~"api.*|web.*"}
9.4.5 热力图 (Heatmap)
# 请求延迟分布
sum by (le) (
rate(http_request_duration_seconds_bucket[5m])
)
9.5 变量模板
9.5.1 查询变量
{
"name": "job",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(up, job)",
"refresh": 1,
"sort": 1
}
9.5.2 多值变量
{
"name": "instance",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(up{job=~\"$job\"}, instance)",
"multi": true,
"includeAll": true,
"allValue": ".*"
}
9.5.3 嵌套变量
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)"
},
{
"name": "pod",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info{namespace=\"$namespace\"}, pod)"
}
9.5.4 常量变量
{
"name": "interval",
"type": "custom",
"options": [
{"value": "1m", "text": "1m"},
{"value": "5m", "text": "5m"},
{"value": "15m", "text": "15m"}
]
}
9.6 告警配置
9.6.1 面板告警
- 编辑面板 → Alert 标签
- 点击 Create Alert
- 配置条件:
# 条件
WHEN avg() OF query(A) IS ABOVE 80
# 查询 A
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
9.6.2 告警规则配置
{
"name": "HighCPU",
"condition": "C",
"data": [
{
"refId": "A",
"query": {
"params": ["A", "5m", "now"]
},
"datasourceUid": "prometheus",
"model": {
"expr": "avg by (instance) (rate(node_cpu_seconds_total[5m]))",
"intervalMs": 1000,
"maxDataPoints": 43200,
"refId": "A"
}
},
{
"refId": "B",
"query": {
"params": []
},
"datasourceUid": "-100",
"model": {
"expression": "A",
"reducer": "avg",
"refId": "B",
"type": "reduce"
}
},
{
"refId": "C",
"query": {
"params": []
},
"datasourceUid": "-100",
"model": {
"conditions": [
{
"evaluator": {"params": [80], "type": "gt"},
"operator": {"type": "and"},
"query": {"params": ["B"]},
"reducer": {"type": "avg"}
}
],
"expression": "B",
"refId": "C",
"type": "threshold"
}
}
],
"noDataState": "NoData",
"execErrState": "Error",
"for": "5m"
}
9.7 常用 Dashboard 示例
9.7.1 Node Exporter Dashboard
{
"panels": [
{
"title": "CPU 使用率",
"type": "timeseries",
"targets": [
{
"expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode='idle'}[5m])) * 100)"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"steps": [
{"value": 0, "color": "green"},
{"value": 70, "color": "yellow"},
{"value": 85, "color": "red"}
]
}
}
}
},
{
"title": "内存使用率",
"type": "gauge",
"targets": [
{
"expr": "(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"max": 100,
"thresholds": {
"steps": [
{"value": 0, "color": "green"},
{"value": 70, "color": "yellow"},
{"value": 85, "color": "red"}
]
}
}
}
},
{
"title": "磁盘使用率",
"type": "gauge",
"targets": [
{
"expr": "(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100"
}
]
}
]
}
9.7.2 Kubernetes Dashboard
# Pod CPU 使用率
sum(rate(container_cpu_usage_seconds_total{container!="POD", container!=""}[5m])) by (pod)
# Pod 内存使用
sum(container_memory_working_set_bytes{container!="POD", container!=""}) by (pod)
# Namespace CPU 配额使用
sum(namespace_cpu:container_cpu_usage_seconds_total:sum_rate) by (namespace)
# Pod 数量
count by (namespace, pod) (kube_pod_info)
9.8 Grafana API
9.8.1 创建数据源
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
http://localhost:3000/api/datasources \
-d '{
"name": "Prometheus",
"type": "prometheus",
"access": "proxy",
"url": "http://prometheus:9090",
"isDefault": true
}'
9.8.2 查询数据
curl -G \
-H "Authorization: Bearer $API_KEY" \
http://localhost:3000/api/ds/query \
--data-urlencode 'queries=[{"refId":"A","expr":"up","datasource":{"type":"prometheus","uid":"prometheus"}}]'
9.8.3 导入 Dashboard
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
http://localhost:3000/api/dashboards/import \
-d '{
"dashboard": {...},
"overwrite": true,
"message": "Import Node Exporter Dashboard"
}'
9.9 本章小结
本章介绍了 Prometheus 与 Grafana 的集成:
- Grafana 概述 - 安装和基本配置
- 数据源配置 - Prometheus 数据源设置
- Dashboard 配置 - JSON 格式和导入
- 常用图表 - 时序图、仪表盘、表格等
- 变量模板 - 查询变量和嵌套变量
- 告警配置 - 面板告警和规则配置
- 常用 Dashboard - 实际示例
- Grafana API - 自动化管理
📖 下一步