Prometheus监控部署
2024-04-03
2分钟阅读时长
被监控端
基础信息(cpu 内存 网络 硬盘)采集器
在 Download | Prometheus下载监控
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xzf node_exporter-1.7.0.linux-amd64.tar.gz
cd node_exporter-1.7.0.linux-amd64
mv node_exporter /usr/bin/
新建 /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=node_exporter Monitoring System
After=network.target
[Service]
ExecStart=/usr/bin/node_exporter --web.listen-address=:9100
[Install]
WantedBy=multi-user.target
启动node_exporter监控服务
systemctl daemon-reload
systemctl restart node_exporter
systemctl status node_exporter
Docker采集器
开启ipv4转发
echo -e "net.ipv4.ip_forward = 1\nnet.ipv4.conf.default.rp_filter = 0 \nnet.ipv4.conf.all.rp_filter = 0" >> /etc/sysctl.conf
sysctl -p
安装nvidia-runtime
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo
yum install -y nvidia-container-runtime
运行cadvisor监控(专门监控docker)
docker run -d -p 8080:8080 --name cadvisor --privileged=true -v /:/rootfs:ro -v /var/run:/var/run:rw -v /sys:/sys:ro -v /var/lib/docker/:/var/lib/docker:ro google/cadvisor:latest
GPU采集器
nvidia_gpu_exporter项目地址 https://github.com/utkuozdemir/nvidia_gpu_exporter/releases
yum install -y https://github.com/utkuozdemir/nvidia_gpu_exporter/releases/download/v1.2.0/nvidia-gpu-exporter_1.2.0_linux_amd64.rpm
监控端
Prometheus服务端我们用docker部署
开启ipv4转发
echo -e "net.ipv4.ip_forward = 1\nnet.ipv4.conf.default.rp_filter = 0 \nnet.ipv4.conf.all.rp_filter = 0" >> /etc/sysctl.conf
sysctl -p
新建 /opt/prometheus/prometheus.yml
prometheus.yml
# my global config
global:
scrape_interval: 15s # 采集被监控段指标的一个周期
evaluation_interval: 15s # 告警评估的一个周期
# 告警的配置文件
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# 告警规则配置
rule_files:
# - "first_rules.yml"
#被监控端的配置
scrape_configs:
- job_name: 'GPU'
static_configs:
- targets: ['10.1.0.69:9835']
labels:
instance: node60
- job_name: "Docker"
static_configs:
- targets: ['10.1.0.69:8080']
labels:
instance: node60
- job_name: "Linux"
static_configs:
- targets: ['10.1.0.69:9100']
labels:
instance: node60
运行
docker run -d --name=prometheus -p 9090:9090 -v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
访问 http://10.1.1.249:9090/
Grafana 面板
运行
docker run -d --name=grafana -p 3000:3000 grafana/grafana
默认用户名密码都是admin
更改语言时区
添加数据源
将上面Prometheus 地址填入其中 save保存
正确的话弹窗
添加仪表盘
选择导入一个
在Dashboards | Grafana Labs 找到自己喜欢的仪表盘
点进去,复制ID
填入对应位置,点击加载
选择我们的数据源 然后import下
即可
这是node export采集器展示的,同理你可以添加gpu docker的dashboard