Prometheus for Monitoring Specific Log Counts

感谢 @云原生小白提供线索

在系统的监控过程中，有时我们只是想要知道一些特定内容的出现数量或者频度，并不关心他的具体内容，而且也不想特意部署一个 Loki 或者 Elasticsearch，这时就可以使用 Fluentd 花里胡哨的插件功能来完成任务了。

Fluentd 有一个 Prometheus 插件，能够提供 Prometheus 接口提供采集数据，插件需要用 fluent-gem 进行安装，如果在 Docker 中的话，可以使用下列 Dockerfile：

FROM fluentd:v1.9.1-1.0
USER root
RUN fluent-gem install fluent-plugin-prometheus
USER fluent

这个插件的基本配置方式是，提供一个 promethues 的类型，包含一个 <metric> 元素用于对指标结构进行定义。例如文档中使用的：

  @type prometheus
  <metric>
    name fluentd_input_status_num_records_total
    type counter
    desc The total number of incoming records
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>

这种指标放在 <filter> 用于指示输入数量，而放在 <match> 中则可以监控输出数量。

这里定义了一个名为 fluentd_input_status_num_records_total 的指标，其类型为 counter。

定义指标之后，还要将其暴露给 Prometheus：

<source>
  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics
</source>

这段配置定义了一个监听 24231 端口的 Prometheus 端点，路径为 /metrics。

举个栗子

接下来用一个完整场景来展示这个例子，假设我们要监控 /logs/input.txt 中的 warning 数量，会采用文末的完整配置，分段解释如下：

<source> 段定义采集文件名称
第一个 <filter> 中使用 @type promethues 来监控输入数量，生成指标 fluentd_input_status_num_records_total，类型为 counter
第二个 <filter> 用 @type grep 的正则表达式插件对输入进行过滤
<match> 节中使用 @type copy 对输出进行分流
第一个 <store> 输出 fluentd_output_status_num_records_total 的 Promethues 指标，对过滤出来的文本进行计数
第二个 <store> 将输出内容展示在 stdout

配置结束之后启动采集过程，可以使用类似如下脚本：

#!/bin/sh
docker run -it --rm \
        -v $(pwd)/etc:/etc/fluentd \
        -v $(pwd)/log:/data \
        -p 12345:12345 \
        fluentd:prom \
        fluentd -c /etc/fluentd/fluentd.conf

启动之后，我们向日志中输出内容，例如 echo "warn" >> input.txt，会看到 fluentd 日志输出了类似 2021-08-14 07:06:55.688191458 +0000 custom.log: {"message":"warn"} 的内容，如果使用 curl 访问开放出来的 :12345/metrics，会看到输出中的如下内容：

fluentd_input_status_num_records_total{tag="custom.log",hostname="757214c8a91a"} 2.0      │➜  log  vim fluentd.conf
fluentd_output_status_num_records_total{tag="custom.log",hostname="757214c8a91a"} 1.0

这是很常见的指标格式，如果在 Kubernetes 中，对 Pod 进行注解，纳入采集范围，就可以像其它监控指标一样使用了。

fluentd.conf

<source>
  @type tail
  path /data/input.txt
  pos_file /data/input.pos
  tag custom.log
  <parse>
    @type none
  </parse>
</source>
<filter custom.**>
  @type prometheus
  <metric>
    name fluentd_input_status_num_records_total
    type counter
    desc The total number of incoming records
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
</filter>
<filter custom.**>
  @type grep
  <regexp>
    key message
    pattern /warn/
  </regexp>
</filter>
<match custom.**>
  @type copy
  <store>
    @type prometheus
    <metric>
      name fluentd_output_status_num_records_total
      type counter
      desc The total number of outgoing records
      <labels>
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
  </store>
  <store>
    @type stdout
</match>

<source>
  @type prometheus
  bind 0.0.0.0
  port 12345
  metrics_path /metrics
</source>

<source>
  @type prometheus_output_monitor
  interval 10
  <labels>
    hostname ${hostname}
  </labels>
</source>

（空想场景）使用 Prometheus 监控特定日志行数

举个栗子

fluentd.conf

Comments

More from this blog

FDE 是传统运维产品厂商的出路吗？

绵里藏针才是 AIOps 的本质？

龙虾恐慌：AIOps 又要改名了？

再见 2025

辅助编程？dora 说：我知道你很急可是请你别急

Command Palette

举个栗子

fluentd.conf

Comments

More from this blog