第九章:Grafana 集成

详细介绍 Prometheus 与 Grafana 的集成配置,包括数据源配置、Dashboard 使用、告警配置等

最后更新: 2024-01-01
页面目录

第九章:Grafana 集成

9.1 Grafana 概述

Grafana 是 Prometheus 官方推荐的可视化工具,提供丰富的图表类型和强大的仪表盘功能。

9.1.1 安装

# Docker 安装
docker run -d \
  --name=grafana \
  -p 3000:3000 \
  -v grafana_data:/var/lib/grafana \
  grafana/grafana:latest

# Helm 安装
helm install grafana prometheus-community/grafana \
  --namespace monitoring \
  --set adminPassword='your_password' \
  --set persistence.enabled=true

9.1.2 默认凭据

  • 用户名: admin
  • 密码: admin (首次登录需修改)

9.2 数据源配置

9.2.1 添加 Prometheus 数据源

通过 Web UI:

  1. 进入 ConfigurationData Sources
  2. 点击 Add data source
  3. 选择 Prometheus
  4. 填写配置:
# HTTP 配置
URL: http://prometheus:9090
Access: Server (default)  # 推荐

# Auth 配置
Basic auth: 
User: admin
Password: your_password

# Prometheus 设置
Scrape interval: 15s
Query timeout: 60s
HTTP Method: POST

9.2.2 使用配置文件

# /etc/grafana/provisioning/datasources/prometheus.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true
    jsonData:
      httpMethod: POST
      timeInterval: 15s
      queryTimeout: 60s

9.3 Dashboard 配置

9.3.1 自动导入 Dashboard

# /etc/grafana/provisioning/dashboards/prometheus.yml
apiVersion: 1

providers:
  - name: 'Prometheus'
    orgId: 1
    folder: 'Prometheus'
    type: file
    disableDeletion: false
    editable: true
    options:
      path: /var/lib/grafana/dashboards

9.3.2 Dashboard JSON 格式

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "panels": [
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {"mode": "palette-classic"},
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {"tooltip": false, "viz": false, "legend": false},
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {"type": "linear"},
            "showPoints": "never",
            "spanNulls": true
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {"color": "green", "value": null},
              {"color": "red", "value": 80}
            ]
          },
          "unit": "percent"
        }
      },
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
      "id": 1,
      "options": {
        "legend": {"displayMode": "list", "placement": "bottom"},
        "tooltip": {"mode": "single"}
      },
      "pluginVersion": "8.0.0",
      "targets": [
        {
          "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
          "legendFormat": "{{ instance }}",
          "refId": "A"
        }
      ],
      "title": "CPU 使用率",
      "type": "timeseries"
    }
  ],
  "refresh": "5s",
  "schemaVersion": 27,
  "style": "dark",
  "tags": ["node", "system"],
  "templating": {
    "list": [
      {
        "allValue": ".*",
        "current": {"selected": true, "text": "All", "value": "$__all"},
        "datasource": "Prometheus",
        "definition": "label_values(node_cpu_seconds_total, instance)",
        "description": null,
        "hide": 0,
        "includeAll": true,
        "label": "实例",
        "multi": true,
        "name": "instance",
        "options": [],
        "query": "label_values(node_cpu_seconds_total, instance)",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      }
    ]
  },
  "time": {"from": "now-6h", "to": "now"},
  "timepicker": {},
  "timezone": "browser",
  "title": "Node Exporter Dashboard",
  "uid": "node-exporter",
  "version": 1
}

9.4 常用图表类型

9.4.1 时序图 (Time Series)

# 单实例 CPU
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# 多实例网络流量
rate(node_network_receive_bytes_total[5m])
rate(node_network_transmit_bytes_total[5m])

9.4.2 统计图 (Stat)

# 当前值显示
up{job="prometheus"}
count(up)
sum(http_requests_total)

9.4.3 仪表盘 (Gauge)

# 内存使用率
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) 
  / node_memory_MemTotal_bytes * 100

9.4.4 表格 (Table)

# 服务状态
up{job=~"api.*|web.*"}

9.4.5 热力图 (Heatmap)

# 请求延迟分布
sum by (le) (
  rate(http_request_duration_seconds_bucket[5m])
)

9.5 变量模板

9.5.1 查询变量

{
  "name": "job",
  "type": "query",
  "datasource": "Prometheus",
  "query": "label_values(up, job)",
  "refresh": 1,
  "sort": 1
}

9.5.2 多值变量

{
  "name": "instance",
  "type": "query",
  "datasource": "Prometheus",
  "query": "label_values(up{job=~\"$job\"}, instance)",
  "multi": true,
  "includeAll": true,
  "allValue": ".*"
}

9.5.3 嵌套变量

{
  "name": "namespace",
  "type": "query",
  "datasource": "Prometheus",
  "query": "label_values(kube_pod_info, namespace)"
},
{
  "name": "pod",
  "type": "query",
  "datasource": "Prometheus",
  "query": "label_values(kube_pod_info{namespace=\"$namespace\"}, pod)"
}

9.5.4 常量变量

{
  "name": "interval",
  "type": "custom",
  "options": [
    {"value": "1m", "text": "1m"},
    {"value": "5m", "text": "5m"},
    {"value": "15m", "text": "15m"}
  ]
}

9.6 告警配置

9.6.1 面板告警

  1. 编辑面板 → Alert 标签
  2. 点击 Create Alert
  3. 配置条件:
# 条件
WHEN avg() OF query(A) IS ABOVE 80

# 查询 A
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

9.6.2 告警规则配置

{
  "name": "HighCPU",
  "condition": "C",
  "data": [
    {
      "refId": "A",
      "query": {
        "params": ["A", "5m", "now"]
      },
      "datasourceUid": "prometheus",
      "model": {
        "expr": "avg by (instance) (rate(node_cpu_seconds_total[5m]))",
        "intervalMs": 1000,
        "maxDataPoints": 43200,
        "refId": "A"
      }
    },
    {
      "refId": "B",
      "query": {
        "params": []
      },
      "datasourceUid": "-100",
      "model": {
        "expression": "A",
        "reducer": "avg",
        "refId": "B",
        "type": "reduce"
      }
    },
    {
      "refId": "C",
      "query": {
        "params": []
      },
      "datasourceUid": "-100",
      "model": {
        "conditions": [
          {
            "evaluator": {"params": [80], "type": "gt"},
            "operator": {"type": "and"},
            "query": {"params": ["B"]},
            "reducer": {"type": "avg"}
          }
        ],
        "expression": "B",
        "refId": "C",
        "type": "threshold"
      }
    }
  ],
  "noDataState": "NoData",
  "execErrState": "Error",
  "for": "5m"
}

9.7 常用 Dashboard 示例

9.7.1 Node Exporter Dashboard

{
  "panels": [
    {
      "title": "CPU 使用率",
      "type": "timeseries",
      "targets": [
        {
          "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode='idle'}[5m])) * 100)"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "thresholds": {
            "steps": [
              {"value": 0, "color": "green"},
              {"value": 70, "color": "yellow"},
              {"value": 85, "color": "red"}
            ]
          }
        }
      }
    },
    {
      "title": "内存使用率",
      "type": "gauge",
      "targets": [
        {
          "expr": "(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "max": 100,
          "thresholds": {
            "steps": [
              {"value": 0, "color": "green"},
              {"value": 70, "color": "yellow"},
              {"value": 85, "color": "red"}
            ]
          }
        }
      }
    },
    {
      "title": "磁盘使用率",
      "type": "gauge",
      "targets": [
        {
          "expr": "(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100"
        }
      ]
    }
  ]
}

9.7.2 Kubernetes Dashboard

# Pod CPU 使用率
sum(rate(container_cpu_usage_seconds_total{container!="POD", container!=""}[5m])) by (pod)

# Pod 内存使用
sum(container_memory_working_set_bytes{container!="POD", container!=""}) by (pod)

# Namespace CPU 配额使用
sum(namespace_cpu:container_cpu_usage_seconds_total:sum_rate) by (namespace)

# Pod 数量
count by (namespace, pod) (kube_pod_info)

9.8 Grafana API

9.8.1 创建数据源

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  http://localhost:3000/api/datasources \
  -d '{
    "name": "Prometheus",
    "type": "prometheus",
    "access": "proxy",
    "url": "http://prometheus:9090",
    "isDefault": true
  }'

9.8.2 查询数据

curl -G \
  -H "Authorization: Bearer $API_KEY" \
  http://localhost:3000/api/ds/query \
  --data-urlencode 'queries=[{"refId":"A","expr":"up","datasource":{"type":"prometheus","uid":"prometheus"}}]'

9.8.3 导入 Dashboard

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  http://localhost:3000/api/dashboards/import \
  -d '{
    "dashboard": {...},
    "overwrite": true,
    "message": "Import Node Exporter Dashboard"
  }'

9.9 本章小结

本章介绍了 Prometheus 与 Grafana 的集成:

  1. Grafana 概述 - 安装和基本配置
  2. 数据源配置 - Prometheus 数据源设置
  3. Dashboard 配置 - JSON 格式和导入
  4. 常用图表 - 时序图、仪表盘、表格等
  5. 变量模板 - 查询变量和嵌套变量
  6. 告警配置 - 面板告警和规则配置
  7. 常用 Dashboard - 实际示例
  8. Grafana API - 自动化管理

📖 下一步