PrometheusのSwarmに対するサービス発見プラグイン

7396 ワード

最近プロジェクトの必要性のため、自分でPrometheusのSwamに対するサービス発見を実現して、metricsを収集しやすいようにしました.コードはgithub,retrieval/discovery/swarmにあります.Prometheus 0.17.0とSwarm v 1に基づく.1.0.
システム要件
Swarm自体はprometheusにmetricsの出力インタフェースを提供していないため、Swarmの各masterとnodeでCAdvisorを走る必要があります.プラグインのデフォルトCAdvisorの/metrics entrypointのデフォルトインタフェースは8070(構成可能)です.
コンフィギュレーション

- job_name: service-swarm
  swarm_sd_configs:
  - masters:
    - 'http://swarm.example.com:8080'

    refresh_interval: 1s
    metrics_port: '8060'

refresh_intervalはプラグインを制定してmetricsの時間間隔を収集します;

metricsはCAdvisorの/metircsポートを制定した.

label policyの関連構成はKubernetesと一致している.

げんり
Prometheusサービス発見プラグインインタフェース

// prometheus/retrieval/targetmanager.go

// A TargetProvider provides information about target groups. It maintains a set
// of sources from which TargetGroups can originate. Whenever a target provider
// detects a potential change, it sends the TargetGroup through its provided channel.
//
// The TargetProvider does not have to guarantee that an actual change happened.
// It does guarantee that it sends the new TargetGroup whenever a change happens.
//
// Sources() is guaranteed to be called exactly once before each call to Run().
// On a call to Run() implementing types must send a valid target group for each of
// the sources they declared in the last call to Sources().
type TargetProvider interface {
// Sources returns the source identifiers the provider is currently aware of.
Sources() []string
// Run hands a channel to the target provider through which it can send
// updated target groups. The channel must be closed by the target provider
// if no more updates will be sent.
// On receiving from done Run must return.
Run(up chan

Sources()[]string、現在のproviderの表示を返します.target managerが返すstringをtarget groupのID

とする.

Run(up chan

// prometheus/config/config.go

// TargetGroup is a set of targets with a common label set.
type TargetGroup struct {
// Targets is a list of targets identified by a label set. Each target is
// uniquely identifiable in the group by its address label.
Targets []model.LabelSet
// Labels is a set of labels that is common across all targets in the group.
Labels model.LabelSet

// Source is an identifier that describes a group of targets.
Source string
}

Targetsは、prometheusによって解析され、target metricsのアクセス情報を取得するためにmetricsレコードに追加されます.

Labels、通常のlabelはmetricsレコードに追加されます.

Source、IDに等しい.

Swarmサービス発見原理
Nodes
1.Swarm masterのREST API /infoに定期的にアクセスし、clusterの最新情報を取得する.

//   swarm master /info api    response body

{
 "ID": "",
 "Containers": 16,
 "ContainersRunning": 10,
 "ContainersPaused": 0,
 "ContainersStopped": 6,
 "Images": 30,
 "Driver": "",
 "DriverStatus": null,
 "SystemStatus": [
  [
   "Role",
   "primary"
  ],
  [
   "Strategy",
   "spread"
  ],
  [
   "Filters",
   "health, port, dependency, affinity, constraint"
  ],
  [
   "Nodes",
   "2"
  ],
  [
   " hh-yun-k8s-128049.vclound.com",
   "10.199.128.49:2375"
  ],
  [
   " └ Status",
   "Healthy"
  ],
  [
   " └ Containers",
   "8"
  ],
  [
   " └ Reserved CPUs",
   "12 / 25"
  ],
  [
   " └ Reserved Memory",
   "3.75 GiB / 132 GiB"
  ],
  [
   " └ Labels",
   "executiondriver=native-0.2, kernelversion=3.10.0-229.4.2.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper"
  ],
  [
   " └ Error",
   "(none)"
  ],
  [
   " └ UpdatedAt",
   "2016-04-05T09:22:57Z"
  ],
  [
   " hh-yun-k8s-128050.vclound.com",
   "10.199.128.50:2375"
  ],
  [
   " └ Status",
   "Healthy"
  ],
  [
   " └ Containers",
   "8"
  ],
  [
   " └ Reserved CPUs",
   "12 / 25"
  ],
  [
   " └ Reserved Memory",
   "3 GiB / 132 GiB"
  ],
  [
   " └ Labels",
   "executiondriver=native-0.2, kernelversion=3.10.0-229.4.2.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper"
  ],
  [
   " └ Error",
   "(none)"
  ],
  [
   " └ UpdatedAt",
   "2016-04-05T09:23:29Z"
  ]
 ],
 "Plugins": {
  "Volume": null,
  "Network": null,
  "Authorization": null
 },
 "MemoryLimit": true,
 "SwapLimit": true,
 "CpuCfsPeriod": true,
 "CpuCfsQuota": true,
 "CPUShares": true,
 "CPUSet": true,
 "IPv4Forwarding": true,
 "BridgeNfIptables": true,
 "BridgeNfIp6tables": true,
 "Debug": false,
 "NFd": 0,
 "OomKillDisable": true,
 "NGoroutines": 0,
 "SystemTime": "2016-04-05T17:23:50.830465718+08:00",
 "ExecutionDriver": "",
 "LoggingDriver": "",
 "NEventsListener": 0,
 "KernelVersion": "3.10.0-229.4.2.el7.x86_64",
 "OperatingSystem": "linux",
 "OSType": "",
 "Architecture": "amd64",
 "IndexServerAddress": "",
 "RegistryConfig": null,
 "NCPU": 50,
 "MemTotal": 283536760012,
 "DockerRootDir": "",
 "HttpProxy": "",
 "HttpsProxy": "",
 "NoProxy": "",
 "Name": "hh-yun-k8s-128050.vclound.com",
 "Labels": null,
 "ExperimentalBuild": false,
 "ServerVersion": "",
 "ClusterStore": "",
 "ClusterAdvertise": ""
}

2.テキストを解析し、nodeの情報を抽出します.これらの情報をtargetグループにカプセル化し、channelでtarget managerに通知する

func (d *Discovery) Run(up chan

Masters
Swam masterは、prometheusプロファイルによって提供される静的です.providerが起動すると、直接target managerにmaster情報を通知します(Runメソッドコードを参照).ただし、masterにアクセスしてnode情報を取得する場合は、rotationメカニズムを追加して、現在作業中のmasterを見つけます.

func (c *swarmClient) getNodeInfo() (*Info, error) {
c.masterMu.Lock()
defer c.masterMu.Unlock()

for _, master := range c.masters {
urlStr := fmt.Sprintf("%s/info", master.String())
req, err := http.NewRequest("GET", urlStr, nil)
if err != nil {
return nil, err
}

var resp *http.Response
if c.do != nil {
// code for testing
resp, err = c.do(req)
} else {
resp, err = c.client.Do(req)
}
if err == nil {
return c.processNodeInfo(resp)
}

c.rotateMaster()  // rotate master
}
return nil, errors.New("No available master.")
}

func (c *swarmClient) rotateMaster() {
if len(c.masters) > 1 {
c.masters = append(c.masters[1:], c.masters[0])  //     master
}
}

いくつかの考え

Swarmマスターの/infoから返されるjsonテキストは、かなり粗いフォーマットです.docker -H X.X.X.X:2375 info命令の出力を直接無脳にkvフォーマットに変換した.└という記号が現れますテキスト解析に不便をもたらす.

は、REST APIの方式に加える、etcdのwatchメカニズム、またはSwarm自身の発見メカニズムdocker/docker/pkg/discoveryを考慮することができる.

テキスト解析によりnode情報を取得することは、非常に原始的な方法であり、暴力的で信頼できない.しかし、それを選んだのは、できるだけ追加のサードパーティ依存を持ち込まないことを考慮したからです.これにより、後でPrometheusバージョンを更新する際に便利になります(結局0.X.0バージョンです).

WebSocketによる前後通信

JAvascript身分証明書番号検証