Prometheus

prometheus

Prometheus is a monitoring application and time-series database. Anomify can provide an overview of the health of your Prometheus metrics and notify you when they change unexpectedly.

You can send metrics from Prometheus to Anomify by adding a remote_write section to your configuration file. Here is an example configuration:

remote_write:
- url: <PROMETHEUS WRITE ENDPOINT>
  queue_config:
    max_samples_per_send: 1000
  headers:
    key: <ANOMIFY_API_KEY>
    x-tenant-id: <ANOMIFY ORG ID>
    x-server-id: 1
    x-server-url: <YOUR_PROMETHEUS_URL>
    dropLabels: "[['monitor','master']]"
  metadata_config:
    send: true
    send_interval: 1m
    max_samples_per_send: 1000

You will find the unique values for these variables on the Prometheus integrations settings page in your Anomify dashboard: PROMETHEUS_WRITE_ENDPOINT, ANOMIFY_API_KEY, ANOMIFY_ORG_ID

YOUR_PROMETHEUS_URL is a reference to your Prometheus server, it does NOT have to be accessible to us. It could be http://127.0.0.1. We will be using this in the future to create links directly to your own Prometheus metrics from the Anomify dashboard.

ANOMIFY ORG ID is a reference to the Anomify organisation ID you are given when you sign up for an account. Anomify uses this id to segment your metrics from other organisations during analysis. _tenant_id="ANOMIFY ORG ID" is added as a label to your metrics when you view them in Anomify.

x-server-id will be 1 unless you have multiple Prometheus instances, in which case you will need to specify a unique identifier for each server. The _server_id= label will be appended to metrics and can be used to differentiate between metrics names that are in all other respects identical.

headers

In the headers section you must provide:

  headers:
    key: <ANOMIFY_API_KEY>
    x-tenant-id: <ANOMIFY_ORG_ID>
    x-server-id: 1
    x-server-url: <YOUR_PROMETHEUS_URL>

Optionally you can pass 3 other headers:

x-test-only allows you to send metrics to Anomify but have Anomify NOT ingest and process them, but rather just report what metrics would be processed and which would be dropped (see more on this in the section Testing below)

    x-test-only: true

The following two headers are Anomify related and do NOT instruct Prometheus to do anything other than send the header. Prometheus will NOT drop any of the relevant things, Anomify will. Prometheus will still send any data that matches these headers, that is why it is best to ultimately configure these things in the Prometheus remote_write write_relabel_configs rules.

dropMetrics allows you to drop metrics that match certain patterns.

    dropMetrics: "['^flag{.*', '^info_.*{.*']"

dropLabels allows you to drop certain labels and/or label that have certain values. Each label and value is added to a list. So for example in the below the label monitor would be dropped where the label value is master and the label type will be dropped with any value.

    dropLabels: "[['monitor','master'],['type','*']]"

Excluding Metrics

The Developer pricing tier has a limit of 1,000 after which metrics will be dropped automatically. We suggest you add some configuration to Prometheus to only send certain key metrics to begin with (e.g. number of requests, error counts, KPIs etc). This can be achieved using normal Prometheus write_relabel_configs to drop certain metrics.

You can also add exclude rules in the Anomify dashboard, however it is ultimately better to send only those metrics that you want analysed in order to reduce bandwidth, packet size and processing time.

Testing

We understand that creating write_relabel_configs to only remote_write certain metrics can be a daunting and quite difficult task. One can write a set of config rules that they think might work, but it very difficult to know exactly what metrics will sent! To aid you in this we have a testing feature that you can use to send metrics and Anomify will only accept the data, parse it and respond with what metrics were sent, what metrics would be ingested and what metrics would be dropped without Anomify actually ingesting the metrics.

Although there is a json response to these requests with all the data, Prometheus will not log the responses even if --log.level=debug to prometheus is added to the Prometheus startup command. It could be intercepted with tcpdump or similar, but you can access the test results via the Anomify dashboard.

The json response returned from the Anomify dashboard is structured with information in the ['data']['flux_test_metrics'] key with metric data from a 5 minute rolling period:

['data']['flux_test_metrics']['metrics'] - a list of metrics that would be ingested (in last 5 minutes)
['data']['flux_test_metrics']['ingest_count'] - the number of metrics that would be ingested (in last 5 minutes)
['data']['flux_test_metrics']['dropped'] - a list of metrics that would be dropped by match something in dropMetrics (in last 5 minutes)
['data']['flux_test_metrics']['drop_count'] - the number of metrics that would be dropped (in last 5 minutes)
['data']['flux_test_metrics']['metrics_with_no_values'] - a list of metrics that have been sent with no samples values (in last 5 minutes)
['data']['flux_test_metrics']['no_values_count'] - the number of metrics that have been sent with no samples values (in last 5 minutes)
['data']['flux_test_metrics']['metric_details'] - a very detailed breakdown of what metric names, label and values exist and which exists in each
['data']['flux_test_metrics'][<TIMESTAMP>] - the data from each minute period in the 5 minute rolling window.

Testing Example

To test only you must add the x-test-only: true to the remote_write headers. Remember if you send some test metrics, evaluate the test results and then make changes to your Prometheus config and start sending again, you need to wait for a new 5 minute roll window to be populated before fully deciding if the changes had the desired effect. You can mark quicker verifications during that 5 minute period by looking at the ['data']['flux_test_metrics'][<TIMESTAMP>] keys, but only make a final decision after the 5 minute window fully confirms your changes.

remote_write:
- url: <PROMETHEUS_WRITE_ENDPOINT>
  queue_config:
    max_samples_per_send: 1000
  write_relabel_configs:
  - source_labels: [job]
    regex: "(loki|grafana|telegraf)"
    action: drop
  - source_labels: [job]
    regex: "(prometheus)"
    action: keep
  - source_labels: [instance]
    regex: "(hycean.anomify.ai:443|toi700e.anomify.ai:443)"
    action: keep
  headers:
    key: <ANOMIFY_API_KEY>
    x-test-only: true
    x-tenant-id: <ANOMIFY_ORG_ID>
    x-server-id: 1
    x-server-url: <YOUR_PROMETHEUS_URL>
    dropMetrics: "['localhost:9100']"
    dropLabels: "[['monitor','master'],['type', '*']]"
  metadata_config:
    send: true
    send_interval: 1m
    max_samples_per_send: 1000

With the above Prometheus would not send ANY metrics that have a job label value of loki, grafana or telegraf, would keep prometheus (although if a job keep is passed, all other job values would be dropped anyway, this is just showing some example patterns). Prometheus would send metrics that have job prometheus and an instance label of hycean.anomify.ai:443 or toi700e.anomify.ai.

And dropMetrics would cause Anomify drop any metrics that had the string localhost:9100 in them.

And dropLabels would cause Anomify remove the monitor label from metrics which has the value of master and the type label before Anomify records them. So Anomify would not record those labels in the metric so: prometheus_http_requests_total{code="200",handler="/graph",instance="hycean.anomify.ai:443",job="prometheus",monitor="master",type="merge"} Would become: prometheus_http_requests_total{_tenant_id="<x-tenant-id>",_server_id="<x-server-id>",code="200",handler="/graph",instance="hycean.anomify.ai:443",job="prometheus"}

Often with many metrics Prometheus will send through samples with no values, Anomify silently drop any sample that does not have a float value, these will not be recorded in the dropped list but they will be recorded in the metrics_with_no_values key in the test results. These metrics do not go through the dropMetrics filters so they are not classified as to whether they would be ingested or dropped. e.g. Prometheus can send samples with no values:

'samples': [{'timestamp': '1675435920881'}]

Where valid samples have a value:

'samples': [{'value': 4171486.0, 'timestamp': '1675435920881'}]

Prometheus configuration examples

Example 1: Drop Metrics With Label

Say you wanted to drop all metrics that have the label job with any value of loki|prometheus|grafana|telegraf you would define a drop action like this:

  write_relabel_configs:
  - source_labels: [job]
    regex: "(loki|prometheus|grafana|telegraf)"
    action: drop

The full remote_write config would look like this:

remote_write:
- url: <PROMETHEUS WRITE ENDPOINT>
  queue_config:
    max_samples_per_send: 1000
  write_relabel_configs:
  - source_labels: [job]
    regex: "(loki|prometheus|grafana|telegraf)"
    action: drop
  headers:
    key: <API KEY>
    x-tenant-id: <ANOMIFY ORG ID>
    x-server-id: 1
    x-server-url: <YOUR_PROMETHEUS_URL>
    dropLabels: "[['monitor','master']]"
  metadata_config:
    send: true
    send_interval: 1m
    max_samples_per_send: 1000

Only the metrics WITHOUT the job label set to loki|prometheus|grafana|telegraf would be sent to Anomify.

Example 2: Keep Metrics With Label

Conversely, if you only want to send metrics that have the label job with the value of loki then you could define a keep action:

  write_relabel_configs:
  - source_labels: [job]
    regex: "(loki)"
    action: keep

The full remote_write config would look like this:

remote_write:
- url: <PROMETHEUS WRITE ENDPOINT>
  queue_config:
    max_samples_per_send: 1000
  write_relabel_configs:
  - source_labels: [job]
    regex: "(loki)"
    action: keep
  headers:
    key: <API KEY>
    x-tenant-id: <ANOMIFY ORG ID>
    x-server-id: 1
    x-server-url: <YOUR_PROMETHEUS_URL>
    dropLabels: "[['monitor','master']]"
  metadata_config:
    send: true
    send_interval: 1m
    max_samples_per_send: 1000

Only metrics with the job value of loki would be sent to Anomify.

You may find that the following drops a lot of boolean and string metrics and internal go _ type metrics which can reduce metrics counts drastically in many cases.

  write_relabel_configs:
  - source_labels: [__name__]
    regex: "(^flag|^go_.*|^info_.*)"
    action: drop