第十四章:最佳实践
掌握 Ansible 项目组织和编写的高效最佳实践。
最后更新: 2024-01-28
页面目录
Ansible 最佳实践
本章节汇总 Ansible 项目组织和代码编写的高效最佳实践。
项目结构
推荐的目录结构
project/
├── ansible.cfg # Ansible 配置文件
├── requirements.yml # 角色和集合依赖
├── site.yml # 主 Playbook
├── hosts # Inventory 文件
├── host_vars/ # 主机变量
│ ├── host1/
│ │ └── vars.yml
│ └── host2/
├── group_vars/ # 组变量
│ ├── all/
│ │ ├── vault.yml # 加密敏感数据
│ │ └── common.yml # 通用变量
│ ├── production/
│ │ ├── vault.yml
│ │ └── env.yml
│ └── staging/
│ ├── vault.yml
│ └── env.yml
├── library/ # 自定义模块
│ └── my_module.py
├── module_utils/ # 自定义模块工具
├── plugins/ # 自定义插件
│ ├── callback/
│ ├── filter/
│ └── inventory/
├── roles/ # 角色目录
│ ├── common/
│ ├── nginx/
│ ├── mysql/
│ └── app/
├── playbooks/ # Playbook 目录
│ ├── webserver.yml
│ ├── dbserver.yml
│ └── base.yml
├── vars/ # 变量文件
└── files/ # 静态文件
├── scripts/
└── certs/
Inventory 组织
1. 使用目录结构
inventory/
├── hosts.ini # 主清单
├── group_vars/
│ ├── all.yml # 所有主机通用
│ ├── production.yml # 生产环境
│ └── staging.yml # 测试环境
└── host_vars/
├── web1.yml
└── db1.yml
2. 环境分离
# inventory/production/hosts
[production:children]
webservers
dbservers
appservers
[webservers]
prod-web-[01:10].example.com
[dbservers]
prod-db-[01:05].example.com
[production:vars]
environment=production
3. 敏感数据分离
# group_vars/all/vault.yml (加密)
---
vault_db_password: "secure_password"
vault_api_key: "secret_key"
# group_vars/all/public.yml
---
environment: production
db_host: prod-db.example.com
Playbook 编写
1. Playbook 命名规范
# ✅ 推荐
- name: Configure webserver
hosts: webservers
become: yes
# ❌ 避免
- name: do stuff
hosts: all
2. 使用 Roles 组织
# site.yml
---
- name: Deploy infrastructure
import_playbook: playbooks/base.yml
- name: Deploy webservers
import_playbook: playbooks/webservers.yml
- name: Deploy databases
import_playbook: playbooks/dbservers.yml
3. 任务组织
# playbook 结构
---
- name: Deploy application
hosts: webservers
become: yes
# 变量
vars:
app_version: "2.0.0"
# 前置任务
pre_tasks:
- name: Gather facts
setup:
# 导入角色
roles:
- nginx
- app
# 任务
tasks:
- name: Final configuration
template:
src: final.conf.j2
dest: /etc/myapp/final.conf
# 后置任务
post_tasks:
- name: Verify deployment
uri:
url: "http://{{ inventory_hostname }}"
status_code: 200
# 处理器
handlers:
- name: Reload nginx
service:
name: nginx
state: reloaded
4. 使用 Tags
tasks:
- name: Install packages
apt:
name: "{{ packages }}"
tags:
- install
- packages
- name: Configure application
template:
src: app.conf.j2
dest: /etc/myapp/app.conf
tags:
- config
变量管理
1. 命名规范
# ✅ 推荐:使用前缀和描述性名称
nginx_version: "1.24.0"
mysql_max_connections: 1000
app_database_name: myapp
# ❌ 避免:通用名称
var1: value
config: value
2. 变量作用域
# defaults/ - 最低优先级,可被覆盖
# vars/ - 内部使用,不应被覆盖
# inventory - 根据环境设置
# playbook - 特定配置
3. 使用 Vault 保护敏感数据
# group_vars/all/vault.yml
---
vault_db_password: "secret"
vault_api_key: "key"
# 使用时
db_password: "{{ vault_db_password }}"
Roles 开发
1. Role 最小化
# ❌ 一个大 role 包含所有功能
# ✅ 按功能拆分
roles/
├── common/ # 基础配置
├── nginx/ # Web 服务器
├── app/ # 应用部署
├── db/ # 数据库
└── monitoring/ # 监控
2. 使用默认值
# defaults/main.yml
---
app_port: 8080
app_workers: "{{ ansible_facts['processor_vcpus'] | default(1) }}"
app_log_level: "INFO"
3. 清晰的文档
# defaults/main.yml
---
# Application settings
app_name: myapp
app_version: "1.0.0"
# Server settings
app_host: "0.0.0.0"
app_port: 8080
# Database settings
app_db_host: localhost
app_db_port: 3306
app_db_name: myapp
错误处理
1. 使用 Block 和 Rescue
tasks:
- name: Deploy application
block:
- name: Backup current version
command: /opt/app/backup.sh
- name: Deploy new version
command: /opt/app/deploy.sh
- name: Verify deployment
command: /opt/app/verify.sh
rescue:
- name: Rollback on failure
command: /opt/app/rollback.sh
- name: Notify failure
debug:
msg: "Deployment failed, rolled back"
always:
- name: Cleanup
file:
path: /tmp/deploy_temp
state: absent
2. 忽略可控错误
tasks:
- name: Stop service if running
service:
name: myapp
state: stopped
ignore_errors: yes
3. 条件执行
tasks:
- name: Run database migration
command: /opt/app/migrate.sh
when: inventory_hostname == groups['dbservers'][0]
run_once: yes
性能优化
1. 禁用不必要的 Facts 收集
---
- name: Fast playbook
hosts: all
gather_facts: no
# 或选择性收集
- name: Selective facts
hosts: all
gather_facts:
- ansible_distribution
- ansible_memory_mb
2. 优化 SSH 连接
# ansible.cfg
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
3. 并行执行
# 增加并行数
ansible-playbook site.yml -f 50
# ansible.cfg
[defaults]
forks = 50
4. 批量执行 (serial)
# 逐台执行(滚动更新)
- name: Rolling update
hosts: webservers
serial: 1
# 批量执行
- name: Batch update
hosts: webservers
serial:
- 1
- 5
- 10
5. 异步任务
tasks:
- name: Long running task
command: /opt/app/batch.sh
async: 3600
poll: 0
register: batch_job
安全最佳实践
1. 使用 Vault
# 加密敏感文件
ansible-vault encrypt group_vars/all/vault.yml
# 使用密码文件
ansible-playbook site.yml --vault-password-file ~/.vault_pass
2. 最小权限原则
# ❌ 使用 root
- name: Install package
apt:
name: nginx
state: present
# ✅ 使用最小权限
- name: Install package
apt:
name: nginx
state: present
become_user: root
3. 隐藏敏感输出
tasks:
- name: Configure secrets
command: /opt/app/configure.sh
no_log: true
4. SSH 安全
# 使用密钥认证
# ansible.cfg
[defaults]
private_key_file = ~/.ssh/ansible_key
host_key_checking = False
代码审查清单
Playbook 检查
- Playbook 有清晰的
name - 目标主机正确
- 使用
become但设置最小权限 - 任务有
name - 使用
tags便于选择性执行 - 使用
handlers处理变更 - 敏感数据使用 Vault
Variables 检查
- 变量命名清晰一致
- 使用合理的默认值
- 敏感变量已加密
- 变量文档完整
Roles 检查
- 有 README 文档
- 有 default 变量
- 任务组织清晰
- 有 handlers
- 有测试用例
测试
使用 molecule
# 安装 molecule
pip install molecule
# 初始化测试
molecule init role -r myrole
# 运行测试
molecule test
# 开发时使用
molecule create
molecule converge
molecule verify
Ansible-lint
# 安装
pip install ansible-lint
# 运行检查
ansible-lint site.yml
ansible-lint roles/myrole/
# CI/CD 集成
CI/CD 集成
GitHub Actions
# .github/workflows/ansible.yml
name: Ansible CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Install Ansible
run: pip install ansible ansible-lint
- name: Lint Ansible
run: ansible-lint .
- name: Syntax check
run: ansible-playbook --syntax-check site.yml
- name: Test playbook
run: ansible-playbook site.yml --check
下一步
现在你已经掌握了 Ansible 最佳实践。接下来让我们学习故障排查。
👉 故障排查