目录

Prometheus 实战

安装 prometheus

1
2
3
4
5
6
7
mkdir -p ~/Apps/prometheus && cd ~/Apps/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.35.0/prometheus-2.35.0.linux-amd64.tar.gz
tar xvfz prometheus-2.35.0.linux-amd64.tar.gz
cd prometheus-2.35.0.linux-amd64
./prometheus --config.file=prometheus.yml

nohup ./prometheus --config.file=prometheus.yml &
1
2
3
4
--web.enable-lifecycle
# 作用:后期修改参数文件时不需要重启服务,可以通过api重新读取参数文件。
# 重新载入配置文件
curl -X POST http://localhost:9090/-/reload

http://localhost:9090/graph

localhost:9090/metrics

node_exporter

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
cd ~/Apps/prometheus
mkdir exporters & cd exporters
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar -xzvf node_exporter-1.3.1.linux-amd64.tar.gz
cd node_exporter-1.3.1.linux-amd64

./node_exporter --web.listen-address 127.0.0.1:8080
./node_exporter --web.listen-address 127.0.0.1:8081
./node_exporter --web.listen-address 127.0.0.1:8082
# http://localhost:8080/metrics

vim prometheus.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # Evaluate rules every 15 seconds.

  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
    monitor: 'codelab-monitor'

rule_files:
  - 'prometheus.rules.yml'

scrape_configs:
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']

  - job_name:       'node'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:8080', 'localhost:8081']
        labels:
          group: 'production'

      - targets: ['localhost:8082']
        labels:
          group: 'canary'

prometheus.rules.yml

1
2
3
4
5
groups:
- name: cpu-node
  rules:
  - record: job_instance_mode:node_cpu_seconds:avg_rate5m
    expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))

使用新配置重新启动 Prometheus,并通过表达式浏览器查询或绘制图表来验证具有度量名称的新时间序列job_instance_mode:node_cpu_seconds:avg_rate5m 现在是否可用。

mysqld_exporter

Exporter for MySQL server metrics prometheus/mysqld_exporter

1
2
3
4
5
cd /opt/prometheus/exporters
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.linux-amd64.tar.gz
tar -xzvf mysqld_exporter-0.14.0.linux-amd64.tar.gz

cd mysqld_exporter-0.14.0.linux-amd64
1
2
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'XXXXXXXX' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';

Running using an environment variable:

1
2
3
4
5
export DATA_SOURCE_NAME='exporter:XXXXXXXX@(hostname:3306)/'

# ./mysqld_exporter <flags>
nohup ./mysqld_exporter &
# http://localhost:9104/metrics

postgres_exporter

Exporter for PostgreSQL server metrics prometheus/postgres_exporter

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
cd /opt/prometheus/exporters
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.10.1/postgres_exporter-0.10.1.linux-amd64.tar.gz
tar -xzvf postgres_exporter-0.10.1.linux-amd64.tar.gz

cd postgres_exporter-0.10.1.linux-amd64/

# 连接数据库的环境变量
export DATA_SOURCE_NAME="postgresql://postgres:postgres@172.27.3.66:5432/goldoffice_sit?sslmode=disable"

nohup ./postgres_exporter &

./postgres_exporter \
    --web.listen-address=:9187 \
    --web.telemetry-path=/metrics \
    --extend.query-path=/opt/prometheus/plugins/postgres_exporter-0.10.1.linux-amd64/custom-queries.yaml

http://localhost:9187/metrics

1
2
3
4
# 自定义 metrics 默认也有很多指标,不够采用自定义
wget https://github.com/prometheus-community/postgres_exporter/blob/master/queries.yaml
cp queries.yaml custom-queries.yaml
export PG_EXPORTER_EXTEND_QUERY_PATH="/opt/prometheus/exporters/postgres_exporter-0.10.1.linux-amd64/custom-queries.yaml"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# custom-queries.yaml 参考 queries.yaml
pg_stat_slots:
  query: |
    SELECT
      slot_name,
      pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn)) AS replication_slot_lag,
      active
    FROM
      pg_replication_slots    
  metrics:
    - slot_name:
      usage: "LABEL"
      description: "Slot name"
    - replication_slot_lag:
      usage: "LABEL"
      description: "replication slot lag"
    - active:
      usage: "DISCARD"
      description: "active status"

Prometheus 配置更新

1
2
3
  - job_name: "postgres_exporter"
    static_configs:
      - targets: ["localhost:9187"]

Grafana

Grafana 官网

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mkdir ~/Apps/grafana & cd ~/Apps/grafana
sudo apt-get install -y adduser libfontconfig1
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_8.5.2_amd64.deb
 
sudo dpkg -i grafana-enterprise_8.5.2_amd64.deb

# WSL 需要替换 systemctl
sudo service grafana-server start
sudo service grafana-server status
# Configure the Grafana server to start at boot:
sudo update-rc.d grafana-server defaults

http://localhost:3000/
admin
admin

cat /etc/init.d/grafana-server
  • Installs binary to /usr/sbin/grafana-server
  • Installs Init.d script to /etc/init.d/grafana-server
  • Creates default file (environment vars) to /etc/default/grafana-server
  • Installs configuration file to /etc/grafana/grafana.ini
  • Installs systemd service (if systemd is available) name grafana-server.service
  • The default configuration sets the log file at /var/log/grafana/grafana.log
  • The default configuration specifies a SQLite3 db at /var/lib/grafana/grafana.db
  • Installs HTML/JS/CSS and other Grafana files at /usr/share/grafana
1
2
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.2.linux-amd64.tar.gz
tar -zxvf grafana-enterprise-8.5.2.linux-amd64.tar.gz

grafanaid: 455、grafanaid: 9628 是 postgresql 的模板,可以参考

postgres 监控slot状态

/images/cloud-native/prometheus/image-20220512162902995.png

postgres 监控slot大小

/images/cloud-native/prometheus/image-20220512162953568.png

Spring Boot

如果您查看 Prometheus 文档,它会建议您将 Prometheus JMX ExporterPrometheus Java 客户端 添加到您的应用程序中。

Micrometer 可以帮助您从应用程序中进行测量,并发布这些指标以供许多不同的应用程序(包括 Prometheus)抓取。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- 添加专门启用 Prometheus 支持的 Micrometer 注册表依赖项。这允许 Micrometer 收集的指标以 Prometheus 的方式公开-->
<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
  <scope>runtime</scope>
</dependency>
1
2
# Actuator 依赖项已经包含了 Micrometer
mvn dependency:tree -Dincludes=io.micrometer:micrometer-core
management.endpoints.web.exposure.include=health,info,prometheus

许多指标将自动暴露在执行器端点上,即 /actuator/prometheus

自定义指标

很多时候,您会对使用 Micrometer 开箱即用的基本指标感到满意。但是您可能想要添加自己的自定义指标。

METRIC TYPE USE IT FOR… EXAMPLES
Gauge 测量资源使用情况、容量等。可以上升和下降并且具有固定上限的值 集合的大小、正在运行的线程数、队列上的消息数、内存使用情况
Counter 衡量一些事件或行动 - 一个只会增加而不会减少的值。 处理的订单总数、完成的任务总数等。
Timer 测量短期事件及其频率 方法执行时间、请求持续时间、煮鸡蛋所用的时间。

自定义 Timer,需要添加 Spring AOP 依赖。

1
2
3
4
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

在 Spring 上下文中注册 bean

1
2
3
4
5
// TimedAspect 依赖 AspectJ
@Bean
public TimedAspect timedAspect(MeterRegistry registry) {
  return new TimedAspect(registry);
}

然后,找到您要计时的方法,并为其添加@Timed注释。使用value属性为度量命名。

1
2
3
4
5
// io.micrometer.core.annotation.Timed
@Timed(value = "greeting.time", description = "Time taken to return greeting")
public Greeting getGreeting() {
    return new Greeting());
}

指标导入 Prometheus

prometheus.yml

1
2
3
4
5
6
scrape_configs:
  - job_name: 'spring boot scrape'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:8080']

附录