在生产环境,我们肯定需要对我们的服务、端口等进行探测、监控和告警,以便第一时间获取服务的状态。blackbox_exporter提供icmp、tcp、udp、http等多种探针。
安装BlackBox_exporter
下载地址:https://github.com/prometheus/blackbox_exporter
tar zxvf blackbox_exporter-0.18.0.linux-amd64.tar.gz # 启动 cd blackbox_exporter-0.18.0.linux-amd64 cp blackbox_exporter /usr/local/bin nohup blackbox_exporter >> logs/blackbox_exporter.log &
|
配置Prometheus
- job_name: icmp_probe metrics_path: /probe params: module: [icmp] static_configs: - targets: ['180.101.49.11'] relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9115 - job_name: tcp_probe metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: ['172.19.205.2:8080'] relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9115 - job_name: http_probe metrics_path: /probe params: module: [http_2xx] static_configs: - targets: ['http:www.baidu.com'] relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9115
|
配置Grafana模板
引入 13659 模板id
配置Rules规则
groups: - name: blackbox_rules rules: - alert: BlackboxProbeFailed expr: probe_success == 0 for: 0m labels: severity: critical annotations: summary: Blackbox probe failed (instance {{ $labels.instance }}) description: Probe failed\n VALUE = {{ $value }} - alert: BlackboxSlowProbe expr: probe_duration_seconds > 1 for: 1m labels: severity: warning annotations: summary: Blackbox slow probe (instance {{ $labels.instance }}) description: Blackbox probe took more than 1s to complete\n VALUE = {{ $value }} - alert: BlackboxProbeHttpFailure expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400 for: 0m labels: severity: critical annotations: summary: Blackbox probe HTTP failure (instance {{ $labels.instance }}) description: HTTP status code is not 200-399\n VALUE = {{ $value }} - alert: BlackboxSslCertificateWillExpireSoon expr: (probe_ssl_earliest_cert_expiry - time()) / 3600 / 24 < 7 for: 0m labels: severity: warning annotations: summary: Blackbox SSL certificate will expire soon (instance {{ $labels.instance }}) description: SSL certificate expires in 7 days\n VALUE = {{ $value }} - alert: BlackboxSslCertificateWillExpireSoon expr: (probe_ssl_earliest_cert_expiry - time()) / 3600 / 24 < 3 for: 0m labels: severity: warning annotations: summary: Blackbox SSL certificate will expire soon (instance {{ $labels.instance }}) description: SSL certificate expires in 3 days\n VALUE = {{ $value }} - alert: BlackboxSslCertificateWillExpireSoon expr: (probe_ssl_earliest_cert_expiry - time()) <= 0 for: 0m labels: severity: critical annotations: summary: Blackbox SSL certificate expired (instance {{ $labels.instance }}) description: SSL certificate has expired already\n VALUE = {{ $value }} - alert: BlackboxProbeSlowHttp expr: probe_http_duration_seconds > 1 for: 1m labels: severity: warning annotations: summary: Blackbox probe slow HTTP (instance {{ $labels.instance }}) description: HTTP request took more than 1s\n VALUE = {{ $value }} - alert: BlackboxProbeSlowPing expr: probe_icmp_duration_seconds > 1 for: 1m labels: severity: warning annotations: summary: Blackbox probe slow ping (instance {{ $labels.instance }}) description: Blackbox ping took more than 1s\n VALUE = {{ $value }} - alert: BlackboxProbeSlowDNS expr: probe_dns_lookup_time_seconds > 1 for: 1m labels: severity: warning annotations: summary: Blackbox probe slow dns (instance {{ $labels.instance }}) description: Blackbox dns took more than 1s\n VALUE = {{ $value }}
|