第十章:集群部署

掌握 Loki 分布式集群的部署和运维。

最后更新: 2024-01-24
页面目录

Loki 集群部署

本章节介绍 Loki 分布式集群的部署和配置。

集群架构

┌─────────────────────────────────────────────────────────────────┐
│                      Loki 集群架构                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│                        ┌─────────────┐                          │
│                        │   Gateway   │                          │
│                        │   (nginx)   │                          │
│                        └──────┬──────┘                          │
│                               │                                  │
│         ┌─────────────────────┼─────────────────────┐           │
│         │                     │                     │           │
│   ┌─────▼─────┐         ┌─────▼─────┐         ┌─────▼─────┐     │
│   │ Distributor│        │  Query    │         │   Ruler   │     │
│   │           │         │  Frontend │         │           │     │
│   └─────┬─────┘         └─────┬─────┘         └─────┬─────┘     │
│         │                     │                     │           │
│         └─────────────────────┼─────────────────────┘           │
│                               │                                  │
│         ┌─────────────────────┼─────────────────────┐           │
│         │                     │                     │           │
│   ┌─────▼─────┐         ┌─────▼─────┐         ┌─────▼─────┐     │
│   │ Ingester  │         │  Querier  │         │ Compactor │     │
│   │           │         │           │         │           │     │
│   └─────┬─────┘         └─────┬─────┘         └───────────┘     │
│         │                     │                                  │
│         └─────────────────────┼─────────────────────┘           │
│                               │                                  │
│                    ┌──────────▼──────────┐                       │
│                    │   Object Storage   │                       │
│                    │   (S3/GCS/MinIO)  │                       │
│                    └─────────────────────┘                       │
│                                                                  │
│                    ┌─────────────────────┐                      │
│                    │   Memberlist Cluster │                      │
│                    │   (服务发现)          │                      │
│                    └─────────────────────┘                      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

微服务模式

组件说明

组件 说明 职责
Distributor 分发器 接收日志、分片、验证
Ingester 采集器 存储日志、查询数据
Query Frontend 查询前端 查询调度、并行化
Querier 查询器 执行查询
Compactor 压缩器 压缩和保留
Ruler 告警器 告警规则
Gateway 网关 路由分发

Helm values 配置

# values-cluster.yaml
---
# 集群配置
fullnameOverride: loki

replicas: 3

image:
  repository: grafana/loki
  tag: 2.9.0

# 微服务模式
singleBinary:
  replicas: 0

# 启用微服务组件
distributed:
  enabled: true
  memberlist:
    replicas: 3

# Gateway 配置
gateway:
  enabled: true
  replicas: 2
  service:
    type: ClusterIP
  ingress:
    enabled: true
    ingressClassName: nginx
    hosts:
      - host: loki.example.com
        paths:
          - path: /
            pathType: Prefix

# Distributor 配置
distributor:
  replicas: 2
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 1000m
      memory: 1Gi

# Ingester 配置
ingester:
  replicas: 3
  persistence:
    enabled: true
    size: 10Gi
    storageClass: gp3
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi

# Query Frontend 配置
queryFrontend:
  replicas: 2
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 1000m
      memory: 1Gi

# Querier 配置
querier:
  replicas: 3
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi

# Compactor 配置
compactor:
  replicas: 1
  persistence:
    enabled: true
    size: 10Gi

# Ruler 配置
ruler:
  replicas: 1
  persistence:
    enabled: true
    size: 5Gi
  evaluation:
    interval: 15s

# Storage 配置
storage:
  bucketNames:
    chunks: loki-chunks
    ruler: loki-ruler
    admin: loki-admin
  s3:
    endpoint: s3.amazonaws.com
    region: us-east-1
    bucketnames: loki-data

# Memberlist 配置
memberlist:
  service:
    enabled: true
    port: 7946

Kubernetes 部署

创建 Namespace

kubectl create namespace loki-cluster

安装 Loki

helm install loki grafana/loki-distributed \
  --namespace loki-cluster \
  -f values-cluster.yaml

验证部署

# 查看 Pods
kubectl get pods -n loki-cluster

# 查看服务
kubectl get svc -n loki-cluster

# 查看组件状态
kubectl get pods -n loki-cluster -l app.kubernetes.io/component

读写分离

写操作路由

客户端 → Gateway → Distributor → Ingester → Object Storage

读操作路由

客户端 → Gateway → Query Frontend → Querier → Ingester/Object Storage

配置示例

# 读写分离配置
distributor:
  replicas: 3
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70

querier:
  replicas: 3
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 20
    targetCPUUtilizationPercentage: 70

高可用配置

副本数配置

# 最小高可用配置
ingester:
  replicas: 3
  persistence:
    enabled: true
    storageClass: gp3
    size: 50Gi

querier:
  replicas: 3

distributor:
  replicas: 2

存储配置

# 对象存储配置
storage:
  type: s3
  s3:
    endpoint: s3.us-east-1.amazonaws.com
    region: us-east-1
    bucketnames: loki-ha-data
    s3forcepathstyle: false
    http_config:
      idle_conn_timeout: 90s

# 索引存储
schemaConfig:
  configs:
    - from: 2023-01-01
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: loki_index_
        period: 24h
        period: 168h

负载均衡

Gateway Ingress

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: loki-ingress
  namespace: loki-cluster
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
spec:
  ingressClassName: nginx
  rules:
    - host: loki.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: loki-gateway
                port:
                  number: 3100

扩缩容

HPA 配置

# ingester-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: loki-ingester
  namespace: loki-cluster
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: loki-ingester
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

手动扩缩容

# 扩容
kubectl scale statefulset loki-ingester -n loki-cluster --replicas=5

# 缩容
kubectl scale statefulset loki-ingester -n loki-cluster --replicas=3

监控配置

PrometheusMetrics

# 启用指标
metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    interval: 15s

监控面板

# 导入 Grafana Dashboard
dashboards:
  enabled: true
  dashboardLabels:
    grafana_dashboard: "1"

备份和恢复

备份策略

# 备份配置
aws s3 sync s3://loki-chunks ./backup/chunks
aws s3 sync s3://loki-index ./backup/index

# 备份到本地
kubectl exec -n loki-cluster loki-compactor-0 -- \
  tar czf /tmp/backup.tar.gz /var/loki
kubectl cp loki-cluster/loki-compactor-0:/tmp/backup.tar.gz ./backup.tar.gz

恢复策略

# 恢复配置
kubectl cp ./backup s3://loki-chunks --recursive

# 从备份恢复
kubectl exec -n loki-cluster loki-compactor-0 -- \
  tar xzf /var/loki/backup.tar.gz -C /

下一步

接下来让我们学习安全配置。

👉 安全配置