kubernetes实战(十九)：Prometheus入门

1、基本概念

　　Prometheus提供一个函数式的表达式语言，可以使用户实时地查找和聚合时间序列数据。表达式计算结果可以在图表中展示，也可以在Prometheus表达式浏览器中以表格形式展示，或者作为数据源，以HTTP API的方式提供给外部系统使用。

　　Prometheus作为一个时间序列数据库，其采集的数据会以文件的形式存储在本地中，默认的存储路径为data/。也可以通过启动参数--storage.tsdb.path="DATA_DIR/"修改存储路径，也可以指定配置文件--config.file=/etc/prometheus/config/prometheus.yaml。

　　Prometheus Server并不直接监控特定的目标，主要任务是收集数据并存储数据以供外部查询，Prometheus周期性的从Exporter暴露的HTTP服务地址metrics拉取监控样本数据，如下：

# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the ce
rtificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="21600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="43200"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="86400"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="172800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="345600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="604800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="2.592e+06"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="7.776e+06"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1.5552e+07"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="3.1104e+07"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="+Inf"} 0

　　HELP：指标含义

　　TYPE：数据类型

2、查询数据

　　通过prometheus graph查询

　　查询node cpu负载

　　查询2分钟内CPU使用的增长率

　　忽略具体CPU

　　总体使用率，idle为空闲，1-空闲即为使用率

3、查询语法

　　metrics类型：

　　- counter：只增不减的计数器

　　 Counter类型的指标工作方式和计数器一样，只增不减，除非发生系统重置，常见的监控指标有http_requests_total, node_cpu，一般定义Counter类型指标的名称时推荐用_total作为后缀。

　　 Counter是一个简单但又强大的工具，我们可以通过prometheus内置的聚合操作和函数进一步分析数据：

　　 - 获取HTTP请求量增长率：rate(http_requests_total[5m])

　　 - 查询当前系统中，访问量前10的HTTP地址：topk(10, http_requests_total)

　　- Gauge：可增可减的仪表盘

　　与Counter不同，Gauge指标侧重于反应系统的当前状态。常见指标如：node_memory_MemFree、node_memory_MemAvailable

　　通过PromQL内置函数delta()可以获取样本在一段时间返回内变化情况，例如：

　　 - 计算CPU温度在两个小时内的差异：delta(cpu_temp_celsius{host="zeus"}[2h])

　　 - 预测系统磁盘空间在4个小时之后的剩余情况：predict_linear(node_filesystem_free{device="rootfs"}[1h], 4 * 3600)

　　- Histogram和Summary

　　 Histogram和Summary主要用于统计和分析样本的分布情况。

　　大多数情况下我们关注的是某些量化指标的平均值，例如CPU平均使用率、页面的平均响应时间。这种方式的问题很明显，如果大多数API请求都维持在100ms的响应时间范围内，而个别请求的响应时间需要5s，那么就会导致某些WEB页面的响应时间落到中位数的情况，而这种现象被称为长尾问题。

　　为了区分是平均的慢还是长尾的慢，最简单的方式就是按照请求延迟的范围进行分组。例如：统计0~10ms、10~20ms等之间的请求数是多少，通过这种方式可以快速分析系统慢的原因。

　　例如指标prometheus_tsdb_wal_fsync_duration_seconds的指标类型为Summary，记录了prometheus server中wal_fsync处理的时间，监控数据如下：

# HELP prometheus_tsdb_wal_fsync_duration_seconds Duration of WAL fsync.
# TYPE prometheus_tsdb_wal_fsync_duration_seconds summary
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.5"} 0.006537877
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.9"} 0.032073855
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.99"} 0.040792282
prometheus_tsdb_wal_fsync_duration_seconds_sum 318.52705156399924
prometheus_tsdb_wal_fsync_duration_seconds_count 23846

　　 prometheus_tsdb_wal_fsync_duration_seconds_count：wal_fsync总次数

　　 prometheus_tsdb_wal_fsync_duration_seconds_sum：wal_fsync耗时

　　其中0.5表示中位数，0.9表示9分位数，即采样值50%，90%的数据

　　再例如指标prometheus_tsdb_compaction_chunk_range_bucket为Histogram的监控指标，数据如下：

# HELP prometheus_tsdb_compaction_chunk_range Final time range of chunks on their first compaction
# TYPE prometheus_tsdb_compaction_chunk_range histogram
prometheus_tsdb_compaction_chunk_range_bucket{le="100"} 32
prometheus_tsdb_compaction_chunk_range_bucket{le="400"} 32
prometheus_tsdb_compaction_chunk_range_bucket{le="1600"} 32
prometheus_tsdb_compaction_chunk_range_bucket{le="6400"} 32
prometheus_tsdb_compaction_chunk_range_bucket{le="25600"} 32
prometheus_tsdb_compaction_chunk_range_bucket{le="102400"} 2568
prometheus_tsdb_compaction_chunk_range_bucket{le="409600"} 5120
prometheus_tsdb_compaction_chunk_range_bucket{le="1.6384e+06"} 10534
prometheus_tsdb_compaction_chunk_range_bucket{le="6.5536e+06"} 6.591799e+06
prometheus_tsdb_compaction_chunk_range_bucket{le="2.62144e+07"} 6.592246e+06
prometheus_tsdb_compaction_chunk_range_bucket{le="+Inf"} 6.592246e+06
prometheus_tsdb_compaction_chunk_range_sum 2.3547538212574e+13
prometheus_tsdb_compaction_chunk_range_count 6.592246e+06

　　与Summary类型的指标相似之处在于Histogram类型的样本同样会返回当前指标记录的总数以及值的总量。不同于Histogram指标直接反应了在不同区间内样本的个数，区间通过标签len进行定义。

　　同时对于Histogram的指标，可以通过histogram_quantile()函数计算出其值的分位数。不同在于Histogram在服务器端计算分位数，Summary在客户端计算。因此Summary在通过PromQL进行查询时有更好的性能表现，而Histogram则会消耗更多的资源。

4、PromQL

　　查询时间序列：

　　- 通过监控指标名称查询：http_requests_total或http_requests_total{}

　　- 过滤查询：=或者!=，比如http_requests_total{instance="localhost:9090"}或者http_requests_total{instance!="localhost:9090"}

　　- 正则匹配：label=~regx或label!~regx，比如http_requests_total{environment=~"staging|testing|development",method!="GET"}

　　范围查询：直接通过total查询时间序列时，返回值中只会包含该时间序列中最新的一个样本值称为瞬时向量，对应的表达式称为瞬时向量表达式。如果想取过去一段时间范围内的数据称为区间向量。时间范围在[]中定义。

　　- 查询最近5分钟返回值400的样本数据：http_requests_total{code="400"}[5m]

　　时间位移操作：瞬时向量和区间向量都是以当前时间为基准，而时间位移操作offset可以以过去的某个时间为基准。

　　- 查询50分钟前的瞬时样本数据：http_requests_total{code="400"} offset 50m

　　- 查询昨天一分钟的区间内样本数据：http_requests_total{code="200"}[1m] offset 1d

　　使用聚合操作：一般来说，如果描述样本特征的label并非唯一，通过PromQL查询数据，会返回多条满足这些特征维度的时间序列。而PromQL提供的聚合操作可以用来对这些时间序列进行处理，形成一条新的时间序列|

　　- 查询系统所有http请求总量：sum(http_requests_total)

　　- 查询系统所有http返回值为404的请求总量：sum(http_requests_total{code="404"})

　　- 按照mode计算主机CPU的平均使用时间：avg(node_cpu) by (mode)

　　- 按照主机查询各个主机的CPU使用率：sum(irate(node_cpu{mode!='idle'}[5m])) by (instance) / sum(irate(node_cpu[5m])) by (instance)

　　标量和字符串：

　　- 标量Scalar：一个浮点型的数字，没有时序，如：10。注意count返回的数据类型是瞬时向量，可以用scalar转化为标量。

　　- 字符串：

　　 - 可被转义：'these are unescaped: \n \\ \t'。

　　 - 不被转义：`these are unescaped: \n \\ \t`

　　 - 普通字符串：“this is a string”

5、PromQL操作符

　　数学运算：可以将获取的数据进行数学运算。

　　- 获取总内存的Bytes转为为MB：node_memory_MemTotal / 1024 / 1024

　　- 磁盘读写：node_disk_bytes_written{device=~"sda|sdb"} + node_disk_bytes_read{device=~"sda|sdb"}

　　布尔运算：瞬时向量与标量进行布尔运算时，PromQL依次比较向量中的所有时间序列样本的值，如果比较结果为true则保留，反之丢弃。

　　- 内存剩余低于50%的节点：(node_memory_MemAvailable + node_memory_MemFree) / node_memory_MemTotal < 0.5

　　bool修改符：布尔运算的默认行为是对时序数据进行顾虑，而其他情况我们可能需要真正的布尔结果，这时可以使用bool修饰符改变布尔运算的默认行为得到1或者0

　　- 找到HTTP请求量大于1000并且返回1，反之为0：http_requests_total > bool 1000

　　集合运算：通过集合运算可以在两个瞬时向量之间进行相应的集合操作。

　　- and：并且，vector1 and vector2会产生于一个由vector1的元素组成的新的向量，该向量包含vector1中完全匹配vector2中的元素组成。

　　- or：或者

　　- unless：排除，新向量元素由vector1中没有vector2匹配的元素组成。

　　操作符优先级：

　　- 查询主机CPU使用率：( 1 - avg (irate(node_cpu{mode='idle'}[5m])) by(instance))

　　PromQL操作符中优先级由高到低：

　　^
　　*, /, %
　　+, -
　　==, !=, <=, <, >=, >
　　and, unless
　　or

6、聚合操作

　　Prometheus内置的聚合操作符可以将瞬时表达式返回的样本数据进行聚合，形成一个新的时间序列。

　　sum (求和)

　　min (最小值)

　　max (最大值)

　　avg (平均值)

　　stddev (标准差)

　　stdvar (标准差异)

　　count (计数)

　　count_values (对value进行计数)

　　bottomk (后n条时序)

　　topk (前n条时序)

　　quantile (分布统计)

　　其中只有count_values, quantile, topk, bottomk支持参数，without用于从计算结果中移除列举的标签，而保留其他标签。by正好相反。

　　count_values用于时间序列中每一个样本值出现的次数。count_values会为每一个唯一的样本值输出一个时间序列，并且每一个时间序列包含一个额外的标签。一般是对value进行计数，然后在赋予一个label

　　- 统计HTTP请求数，并对value进行统计：count_values("count", http_requests_total)

　　topk和bottomk则用于对样本值进行排序，返回当前样本值前n位或者后n位的时间序列

　　- 获取HTTP请求数前5位的时序样本数据：topk(5, http_requests_total)

　　quantile用于计算当前样本数据值的分布情况quantile(φ, express)，其中0 ≤ φ ≤ 1。

　　- 找到当前样本数据中的中位数：quantile(0.5, http_requests_total)

7、内置函数

　　计算Counter指标增长率

　　Counter：指标增长率，在没有发生重置的情况下，Counter类型的指标只增不减。

　　- increase(v range-vector)：其中参数v是一个区间向量。可以通过increase(node_cpu[2m]) / 120计算node_cpu最近两分钟平均增长率，其中120为秒。

　　- rate(v range-vector)：rate函数可以直接计算区间向量v在时间窗口内平均增长率。因此可以通过rate(node_cpu[2m])得到与上述increase相同的结果。

　　注意：rate和increase容易陷入长尾问题当中，例如对于主机而言在2分钟的时间窗口内，可能在某一个由于访问量或者其他问题导致CPU占用100%的情况，但是通过计算在时间窗口内的平均增长率无法反应出该问题，为了解决该问题，可以使用灵敏度更改的irate。

　　- irate(v range-vector)：计算的是瞬时增长率，比如：irate(node_cpu[2m])

　　irate函数相比于rate函数提供了更高的灵敏度，不过当需要分析长期趋势或者在告警规则中更推荐使用rate。

　　预测Gauge指标变化趋势

　　- predict_linear函数可以预测时间序列v在t秒后的值。它基于简单线性回归方式。例如基于2小时的样本数据，来预测主机可用磁盘空间是否在4个小时被占满：predict_linear(node_filesystem_free{device="/dev/mapper/centos-root"}[2h] , 4 * 3600) < 0

　　动态标签替换

　　为了能够让客户端的图标更具有可读性，可以通过label_replace标签为时间序列添加额外的标签

　　比如：label_replace(up, "host", "$1", "instance", "(.*):.*")

　　上述host为新添加标签，$1为instance分组后的第一个()内的值。

发表于 2018-12-24 17:03 杜先生的博客阅读(...) 评论(...) 编辑收藏

刷新评论刷新页面返回顶部