prometheus のPV使用量に異常なオーバーヘッドがあるとき

2371 ワード

kubernetes PersistentVolume prometheus kubernetes テキストリンク

症状

画像のように、PrometheusのPV使用量に異常なオーバーヘッドが確認されました。
起動時で70GBほど消費しています。

prometheus version: 2.18.2

原因

kubectl exec でprometheus containerに入ってみると、

k exec -it prometheus-kube-prometheus-stack-prometheus-0 -n monitoring -c prometheus sh

01EN4ZAY5H2KD172GTD9X7JJC3.tmp のようなディレクトリがたくさんありました。
作成された日付を見てみると、明らかに設定した retention date を過ぎていました。

消してみると、下記のようにPV使用量は低下し、妥当な数値になりました。

これはいったい何？

調べてみると、この .tmp というディレクトリはメトリクスの圧縮やretentionが上手く行かなかった際にできると報告されています。

Issue: https://github.com/prometheus/prometheus/issues/8180

We face TSDB storage filling up because of .tmp directories that are left. We understand that these tmp directories result from failed compaction or retention, but even though the Prometheus pods are healthy the left tmp dirs are not rotated / removed at any time later.

実質的な要因はメモリ不足だと。

The actual issue is that Prometheus did not have enough memory.

そういえばメモリ不足で何回か落ちたな

どうやら8月中旬のバージョン2.21で、起動時に自動的に.tmpディレクトリを削除してくれる機能がマージされたみたいです。

PR: https://github.com/prometheus/prometheus/pull/7772

それ以前のバージョンサポートの話が続いているのか、Issue status は未だ Openですね。

ということで、取り急ぎprometheus更新せねば！

Author And Source

この問題について(prometheus のPV使用量に異常なオーバーヘッドがあるとき), 我々は、より多くの情報をここで見つけました https://qiita.com/yo_C_ta/items/c15e8530b1d496cfa7e4

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .