在 Istio 中合并监控指标

已有 Prometheus 监控端点的微服务，如何借助 Istio Sidecar 输出原有业务指标

最近更新于 2022 年 8 月 6 日 2 分钟阅读时长原创

前些天阅读 Istio 文档的时候发现个语焉不详的东西：Metrics Merging，原文如下：

This option is enabled by default but can be disabled by passing –set meshConfig.enablePrometheusMerge=false during installation. When enabled, appropriate prometheus.io annotations will be added to all data plane pods to set up scraping. If these annotations already exist, they will be overwritten. With this option, the Envoy sidecar will merge Istio’s metrics with the application metrics. The merged metrics will be scraped from /stats/prometheus:15020.

大致翻译一下：这是一个缺省开放的功能，可以在安装时用 --set meshConfig.enablePrometheusMerge=false 参数停用这个功能。这个功能启用后，相对应的 prometheus.io 注解就会被加入到所有数据面 Pod 上，以启用 Prometheus 的指标抓取能力。如果这些注解已经存在，那么就会被覆盖。有了这样的功能，Envoy Sidecar 就会把应用指标和 Istio 指标进行合并，Prometheus 可以从 :15020/stats/prometheus 拉取合并后的指标。

看完之后，一头雾水。翻翻代码看到另一番说辞：

applyPrometheusMerge configures prometheus scraping annotations for the “metrics merge” feature. This moves the current prometheus.io annotations into an environment variable and replaces them pointing to the agent.

这段代码实现了指标合并功能。它会把当前的 prometheus.io 注解保存到环境变量之中，并且将原有注解替换为指向 Agent 的内容。

再结合相关代码，大概可以推断其功能大致如下：

网格化微服务在网格化之前使用 prometheus.io 注解标注的抓取方法，会被保存到 Sidecar 的环境变量之中；
合并指标功能，能够将被网格劫持的微服务输出的 Promethues 指标和 Sidecar 自身指标进行合并，输出到 :15020/stats/prometheus 端点，供 Prometheus 拉取。

我们用 Python 的 Prometheus Exporter SDK 中的测试代码做一个示例应用，并使用如下 Dockerfile 进行打包：

FROM python:3.9.13-slim-buster
RUN pip install prometheus-client && mkdir app
COPY server.py /app/server.py
WORKDIR /app
EXPOSE 8000
CMD [ "python3", "server.py" ]

使用 Docker 运行一下，可以看到他输出的简单指标：

$ docker run -p 8000:8000 dustise/promclient:v0.1
Unable to find image 'dustise/promclient:v0.1' locally
v0.1: Pulling from dustise/promclient
...
Status: Downloaded newer image for dustise/promclient:v0.1
$ curl http://127.0.0.1:8000
...
# HELP request_processing_seconds_created Time spent processing request
# TYPE request_processing_seconds_created gauge
request_processing_seconds_created 1.6597804647800276e+09
...

会看到指标中是一些请求相关和 Python 特定的内容，这正像我们一个提供了监控指标的微服务，那么如何将这些“业务”指标和 Sidecar 合并输出呢？根据上文，需要加上 Prometheus 的注解，因此我们准备这样一个 YAML：

apiVersion: apps/v1
kind: Deployment
...
  template:
    metadata:
...
      annotations:
        prometheus.io/path: /
        prometheus.io/port: 8000
        prometheus.io/scrape: true
    spec:
...
---
apiVersion: v1
kind: Service
...

注入 Sidecar 并提交到集群：istioctl kube-inject -f promclient.yaml | kubectl apply -f -。

成功后，可以看看新 Pod 是不是发生了像文档所说的变化：

$ kubectl describe po promclient-6c74596f4f-r5z29 | grep prometheus.io
              prometheus.io/path: /stats/prometheus
              prometheus.io/port: 15020
              prometheus.io/scrape: true

看到我们原有的注解的确被替换为缺省内容，那原有内容是不是出现在环境变量之中？

$ kubectl exec -it [pod] -c istio-proxy -- env |  | grep ANNO
ISTIO_PROMETHEUS_ANNOTATIONS={"scrape":"true","path":"/","port":"8000"}

果然出现在这里了。那么指标是否完成合并了？采集一下 Pod 的 15020 端口：

$ http 10.52.1.11:15020/stats/prometheus | grep python | more
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 101.0
python_gc_objects_collected_total{generation="1"} 273.0
...

可以看到，指标已经被合并到了 Sidecar 指标中之中。

方法固然简单，还是存在一些不适用的场景，例如：

用 mTLS 抓取指标
应用指标和 Sidecar 指标重名
Prometheus 未配置按照标准注解进行抓取

遇到上述问题，可能就需要关掉合并功能，采用自定义抓取的方式了。

istio prometheus

在 Istio 中合并监控指标

崔秀龙

相关