循序渐进--从零开始建设k8s监控之alertmanager+发送飞书(三)
前言书接上文,prometheus已经安装好了,监控数据是有了,我们需要对其进行告警,并且可以发送到对应的平台,比如飞书、钉钉等,这里选择用飞书来测试
环境准备
组件版本操作系统Ubuntu 22.04.4 LTSdocker24.0.7alertmanagerv0.27.0下载编排文件
本文所有的编排文件,都在这里
▶ cd /tmp && git clone git@github.com:wilsonchai8/installations.git && cd installations/prometheus安装alertmanager
alertmanager主要用作对prometheus发来的告警进行响应,包括发送、抑制等
▶ cd installations/prometheus▶ kubectl apply -f alertmanager.yaml检查是否启动
▶ kubectl -n prometheus get pod -owide | grep alertmanageralertmanager-5b6d594f6c-2swpw 1/1 Running 0 69s 10.244.0.17 minikube <none> <none>访问页面
▶ kubectl get node -owideNAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIMEminikube Ready control-plane 6d2h v1.26.3 192.168.49.2 <none> Ubuntu 20.04.5 LTS 6.8.0-45-generic docker://23.0.2▶ kubectl -n prometheus get svc | grep alertmanageralertmanager NodePort 10.110.182.95 <none> 9093:30297/TCP 70shttp://192.168.49.2:30297
测试alertmanager
1. 定义一个测试的deployment
▶ kubectl create deployment busybox-test --image=registry.cn-beijing.aliyuncs.com/wilsonchai/busybox:latest -- sleep 33333deployment.apps/busybox-test created▶ kubectl get podNAME READY STATUS RESTARTS AGEbusybox-test-fcb69d5f9-tn8vx 1/1 Running 0 6s2. 定义告警规则
我们定义当deployment的副本是为0就告警,修改prometheus configmap
在最底部追加,相当于新增一个配置文件,里面专门定义告警规则
apiVersion: v1kind: ConfigMapmetadata:name: prometheus-cmlabels: name: prometheus-cmnamespace: prometheusdata:prometheus.yml: |- global: scrape_interval: 5s evaluation_interval: 5s alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093'] rule_files: - /etc/prometheus/*.rules scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: "prometheus-kube-state-metrics" static_configs: - targets: ["kube-state-metrics.kube-system:8080"] - job_name: 'kubernetes-nodes' tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - source_labels: regex: '(.*):10250' replacement: '${1}:9100' target_label: __address__ action: replace - action: labelmap regex: __meta_kubernetes_node_label_(.+)# 从这里是新加的prometheus.rules: |- groups: - name: test alert rules: - alert: deployment replicas is 0 expr: kube_deployment_spec_replicas == 0 for: 30s labels: severity: slack annotations: summary: deployment replicas is 0然后重启prometheus,查看告警是否生效
3. 触发告警
▶ kubectl scale --replicas=0 deploy busybox-test等待些许片刻,查看alertmanager页面
已经有告警触发了
发送到飞书
我们已经有一个告警了,但是目前没法通知出来,需要给他告警到飞书去
1. 创建飞书的告警群组,并创建机器人拿到机器人的webhook
webhook:
https://open.feishu.cn/open-apis/bot/v2/hook/*******************2. 创建发送消息的服务
这里我们选用python tornado web服务来接收从alertmanager发送的告警信息
from tornado.ioloop import IOLoopimport tornado.httpserver as httpserverimport tornado.webimport requestsimport jsonWEBHOOK_URL = 'https://open.feishu.cn/open-apis/bot/v2/hook/********'def send_to_feishu(msg_raw): headers = { 'Content-Type': 'application/json' } for alert in msg_raw['alerts']: msg = '## 告警发生 ##\n' msg += '\n' msg += '告警:{}\n'.format(alert['labels']['alertname']) msg += '时间:{}\n'.format(alert['startsAt']) msg += '级别:{}\n'.format(alert['labels']['severity']) msg += '详情:\n' msg += ' deploy:{}\n'.format(alert['labels']['deployment']) msg += ' namespace:{}\n'.format(alert['labels']['namespace']) msg += ' content:{}\n'.format(alert['annotations']['summary']) data = { 'msg_type': 'text', 'content': { 'text': msg } } res = requests.Session().post(url=WEBHOOK_URL, headers=headers, json=data) print(res.json())class SendmsgFlow(tornado.web.RequestHandler): def post(self, *args, **kwargs): send_to_feishu(json.loads(self.request.body.decode('utf-8')))def applications(): urls = [] urls.append() return tornado.web.Application(urls)def main(): app = applications() server = httpserver.HTTPServer(app) server.bind(10000, '0.0.0.0') server.start(1) IOLoop.current().start()if __name__ == "__main__": try: main() except KeyboardInterrupt as e: IOLoop.current().stop() finally: IOLoop.current().close()本脚本已上传至仓库
3. 修改alertmanager configmap
修改alertmanager的configmap,把webhook_configs改为sendmsg的api地址
apiVersion: v1kind: ConfigMapmetadata:name: alertmanager-confignamespace: prometheusdata:alertmanager.yml: |- global: resolve_timeout: 5m route: group_by: ['alertname', 'cluster'] group_wait: 30s group_interval: 5m repeat_interval: 5m receiver: default receivers: - name: 'default' webhook_configs: - url: 'http://127.0.0.1:10000/sendmsg'重启alertmanager
4. 检查飞书
至此,一个简单告警流程制作完成
联系我
[*]联系我,做深入的交流
<hr>至此,本文结束
在下才疏学浅,有撒汤漏水的,请各位不吝赐教...
页:
[1]