第十章:集群部署
掌握 Loki 分布式集群的部署和运维。
最后更新: 2024-01-24
页面目录
Loki 集群部署
本章节介绍 Loki 分布式集群的部署和配置。
集群架构
┌─────────────────────────────────────────────────────────────────┐
│ Loki 集群架构 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ Gateway │ │
│ │ (nginx) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ │ │ │ │
│ ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐ │
│ │ Distributor│ │ Query │ │ Ruler │ │
│ │ │ │ Frontend │ │ │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
│ └─────────────────────┼─────────────────────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ │ │ │ │
│ ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐ │
│ │ Ingester │ │ Querier │ │ Compactor │ │
│ │ │ │ │ │ │ │
│ └─────┬─────┘ └─────┬─────┘ └───────────┘ │
│ │ │ │
│ └─────────────────────┼─────────────────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Object Storage │ │
│ │ (S3/GCS/MinIO) │ │
│ └─────────────────────┘ │
│ │
│ ┌─────────────────────┐ │
│ │ Memberlist Cluster │ │
│ │ (服务发现) │ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
微服务模式
组件说明
| 组件 | 说明 | 职责 |
|---|---|---|
| Distributor | 分发器 | 接收日志、分片、验证 |
| Ingester | 采集器 | 存储日志、查询数据 |
| Query Frontend | 查询前端 | 查询调度、并行化 |
| Querier | 查询器 | 执行查询 |
| Compactor | 压缩器 | 压缩和保留 |
| Ruler | 告警器 | 告警规则 |
| Gateway | 网关 | 路由分发 |
Helm values 配置
# values-cluster.yaml
---
# 集群配置
fullnameOverride: loki
replicas: 3
image:
repository: grafana/loki
tag: 2.9.0
# 微服务模式
singleBinary:
replicas: 0
# 启用微服务组件
distributed:
enabled: true
memberlist:
replicas: 3
# Gateway 配置
gateway:
enabled: true
replicas: 2
service:
type: ClusterIP
ingress:
enabled: true
ingressClassName: nginx
hosts:
- host: loki.example.com
paths:
- path: /
pathType: Prefix
# Distributor 配置
distributor:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
# Ingester 配置
ingester:
replicas: 3
persistence:
enabled: true
size: 10Gi
storageClass: gp3
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
# Query Frontend 配置
queryFrontend:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
# Querier 配置
querier:
replicas: 3
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
# Compactor 配置
compactor:
replicas: 1
persistence:
enabled: true
size: 10Gi
# Ruler 配置
ruler:
replicas: 1
persistence:
enabled: true
size: 5Gi
evaluation:
interval: 15s
# Storage 配置
storage:
bucketNames:
chunks: loki-chunks
ruler: loki-ruler
admin: loki-admin
s3:
endpoint: s3.amazonaws.com
region: us-east-1
bucketnames: loki-data
# Memberlist 配置
memberlist:
service:
enabled: true
port: 7946
Kubernetes 部署
创建 Namespace
kubectl create namespace loki-cluster
安装 Loki
helm install loki grafana/loki-distributed \
--namespace loki-cluster \
-f values-cluster.yaml
验证部署
# 查看 Pods
kubectl get pods -n loki-cluster
# 查看服务
kubectl get svc -n loki-cluster
# 查看组件状态
kubectl get pods -n loki-cluster -l app.kubernetes.io/component
读写分离
写操作路由
客户端 → Gateway → Distributor → Ingester → Object Storage
读操作路由
客户端 → Gateway → Query Frontend → Querier → Ingester/Object Storage
配置示例
# 读写分离配置
distributor:
replicas: 3
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
querier:
replicas: 3
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
targetCPUUtilizationPercentage: 70
高可用配置
副本数配置
# 最小高可用配置
ingester:
replicas: 3
persistence:
enabled: true
storageClass: gp3
size: 50Gi
querier:
replicas: 3
distributor:
replicas: 2
存储配置
# 对象存储配置
storage:
type: s3
s3:
endpoint: s3.us-east-1.amazonaws.com
region: us-east-1
bucketnames: loki-ha-data
s3forcepathstyle: false
http_config:
idle_conn_timeout: 90s
# 索引存储
schemaConfig:
configs:
- from: 2023-01-01
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: loki_index_
period: 24h
period: 168h
负载均衡
Gateway Ingress
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: loki-ingress
namespace: loki-cluster
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
spec:
ingressClassName: nginx
rules:
- host: loki.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: loki-gateway
port:
number: 3100
扩缩容
HPA 配置
# ingester-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: loki-ingester
namespace: loki-cluster
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: loki-ingester
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
手动扩缩容
# 扩容
kubectl scale statefulset loki-ingester -n loki-cluster --replicas=5
# 缩容
kubectl scale statefulset loki-ingester -n loki-cluster --replicas=3
监控配置
PrometheusMetrics
# 启用指标
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: 15s
监控面板
# 导入 Grafana Dashboard
dashboards:
enabled: true
dashboardLabels:
grafana_dashboard: "1"
备份和恢复
备份策略
# 备份配置
aws s3 sync s3://loki-chunks ./backup/chunks
aws s3 sync s3://loki-index ./backup/index
# 备份到本地
kubectl exec -n loki-cluster loki-compactor-0 -- \
tar czf /tmp/backup.tar.gz /var/loki
kubectl cp loki-cluster/loki-compactor-0:/tmp/backup.tar.gz ./backup.tar.gz
恢复策略
# 恢复配置
kubectl cp ./backup s3://loki-chunks --recursive
# 从备份恢复
kubectl exec -n loki-cluster loki-compactor-0 -- \
tar xzf /var/loki/backup.tar.gz -C /
下一步
接下来让我们学习安全配置。
👉 安全配置