Prometheus
Prometheus is a monitoring application and time-series database. Anomify can provide an overview of the health of your Prometheus metrics and notify you when they change unexpectedly.
You can send metrics from Prometheus to Anomify by adding a remote_write
section to your configuration file. Here is an example configuration:
remote_write:
- url: <PROMETHEUS WRITE ENDPOINT>
queue_config:
max_samples_per_send: 1000
headers:
key: <ANOMIFY_API_KEY>
x-tenant-id: <ANOMIFY ORG ID>
x-server-id: 1
x-server-url: <YOUR_PROMETHEUS_URL>
dropLabels: "[['monitor','master']]"
metadata_config:
send: true
send_interval: 1m
max_samples_per_send: 1000
You will find the unique values for these variables on the Prometheus integrations settings page in your Anomify dashboard: PROMETHEUS_WRITE_ENDPOINT
, ANOMIFY_API_KEY
, ANOMIFY_ORG_ID
YOUR_PROMETHEUS_URL
is a reference to your Prometheus server, it does NOT have to be accessible to us. It could be http://127.0.0.1. We will be using this in the future to create links directly to your own Prometheus metrics from the Anomify dashboard.
ANOMIFY ORG ID
is a reference to the Anomify organisation ID you are given when you sign up for an account. Anomify uses this id to segment your metrics from other organisations during analysis. _tenant_id="ANOMIFY ORG ID"
is added as a label to your metrics when you view them in Anomify.
x-server-id
will be 1 unless you have multiple Prometheus instances, in which case you will need to specify a unique identifier for each server. The _server_id=
label will be appended to metrics and can be used to differentiate between metrics names that are in all other respects identical.
headers
In the headers section you must provide:
headers:
key: <ANOMIFY_API_KEY>
x-tenant-id: <ANOMIFY_ORG_ID>
x-server-id: 1
x-server-url: <YOUR_PROMETHEUS_URL>
Optionally you can pass 3 other headers:
x-test-only
allows you to send metrics to Anomify but have Anomify NOT ingest
and process them, but rather just report what metrics would be processed and
which would be dropped (see more on this in the section Testing below)
The following two headers are Anomify related and do NOT instruct Prometheus to do anything
other than send the header. Prometheus will NOT drop any of the relevant things,
Anomify will. Prometheus will still send any data that matches these headers,
that is why it is best to ultimately configure these things in the Prometheus
remote_write
write_relabel_configs
rules.
dropMetrics
allows you to drop metrics that match certain patterns.
dropLabels
allows you to drop certain labels and/or label that have certain values.
Each label and value is added to a list. So for example in the below the label monitor
would be dropped where the label value is master
and the label type
will be dropped
with any value.
Excluding Metrics
The Developer pricing tier has a limit of 1,000 after which metrics will be dropped automatically. We suggest you add some configuration to Prometheus to only send certain key metrics to begin with (e.g. number of requests, error counts, KPIs etc). This can be achieved using normal Prometheus write_relabel_configs to drop certain metrics.
You can also add exclude rules in the Anomify dashboard, however it is ultimately better to send only those metrics that you want analysed in order to reduce bandwidth, packet size and processing time.
Testing
We understand that creating write_relabel_configs to only remote_write certain metrics can be a daunting and quite difficult task. One can write a set of config rules that they think might work, but it very difficult to know exactly what metrics will sent! To aid you in this we have a testing feature that you can use to send metrics and Anomify will only accept the data, parse it and respond with what metrics were sent, what metrics would be ingested and what metrics would be dropped without Anomify actually ingesting the metrics.
Although there is a json response to these requests with all the data, Prometheus
will not log the responses even if --log.level=debug to prometheus
is added to
the Prometheus startup command. It could be intercepted with tcpdump or similar,
but you can access the test results via the Anomify dashboard.
The json response returned from the Anomify dashboard is structured with information
in the ['data']['flux_test_metrics']
key with metric data from a 5 minute rolling period:
['data']['flux_test_metrics']['metrics']
- a list of metrics that would be ingested (in last 5 minutes)['data']['flux_test_metrics']['ingest_count']
- the number of metrics that would be ingested (in last 5 minutes)['data']['flux_test_metrics']['dropped']
- a list of metrics that would be dropped by match something in dropMetrics (in last 5 minutes)['data']['flux_test_metrics']['drop_count']
- the number of metrics that would be dropped (in last 5 minutes)['data']['flux_test_metrics']['metrics_with_no_values']
- a list of metrics that have been sent with no samples values (in last 5 minutes)['data']['flux_test_metrics']['no_values_count']
- the number of metrics that have been sent with no samples values (in last 5 minutes)['data']['flux_test_metrics']['metric_details']
- a very detailed breakdown of what metric names, label and values exist and which exists in each['data']['flux_test_metrics'][<TIMESTAMP>]
- the data from each minute period in the 5 minute rolling window.
Testing Example
To test only you must add the x-test-only: true
to the remote_write headers
.
Remember if you send some test metrics, evaluate the test results and then make
changes to your Prometheus config and start sending again, you need to wait for
a new 5 minute roll window to be populated before fully deciding if the changes
had the desired effect. You can mark quicker verifications during that 5 minute
period by looking at the ['data']['flux_test_metrics'][<TIMESTAMP>]
keys, but
only make a final decision after the 5 minute window fully confirms your changes.
remote_write:
- url: <PROMETHEUS_WRITE_ENDPOINT>
queue_config:
max_samples_per_send: 1000
write_relabel_configs:
- source_labels: [job]
regex: "(loki|grafana|telegraf)"
action: drop
- source_labels: [job]
regex: "(prometheus)"
action: keep
- source_labels: [instance]
regex: "(hycean.anomify.ai:443|toi700e.anomify.ai:443)"
action: keep
headers:
key: <ANOMIFY_API_KEY>
x-test-only: true
x-tenant-id: <ANOMIFY_ORG_ID>
x-server-id: 1
x-server-url: <YOUR_PROMETHEUS_URL>
dropMetrics: "['localhost:9100']"
dropLabels: "[['monitor','master'],['type', '*']]"
metadata_config:
send: true
send_interval: 1m
max_samples_per_send: 1000
With the above Prometheus would not send ANY metrics that have a job
label
value of loki
, grafana
or telegraf
, would keep prometheus
(although if
a job keep is passed, all other job values would be dropped anyway, this is just
showing some example patterns).
Prometheus would send metrics that have job
prometheus
and an instance
label of hycean.anomify.ai:443
or toi700e.anomify.ai
.
And dropMetrics
would cause Anomify drop any metrics that had the string
localhost:9100
in them.
And dropLabels
would cause Anomify remove the monitor
label from metrics
which has the value of master
and the type
label before Anomify records
them. So Anomify would not record those labels in the metric so:
prometheus_http_requests_total{code="200",handler="/graph",instance="hycean.anomify.ai:443",job="prometheus",monitor="master",type="merge"}
Would become:
prometheus_http_requests_total{_tenant_id="<x-tenant-id>",_server_id="<x-server-id>",code="200",handler="/graph",instance="hycean.anomify.ai:443",job="prometheus"}
Often with many metrics Prometheus will send through samples with no values,
Anomify silently drop any sample that does not have a float value, these will
not be recorded in the dropped list but they will be recorded in the
metrics_with_no_values
key in the test results. These metrics do not go
through the dropMetrics filters so they are not classified as to whether they
would be ingested or dropped. e.g. Prometheus can send samples with no values:
Where valid samples have a value:
Prometheus configuration examples
Example 1: Drop Metrics With Label
Say you wanted to drop all metrics that have the label job with any value of loki|prometheus|grafana|telegraf you would define a drop
action like this:
write_relabel_configs:
- source_labels: [job]
regex: "(loki|prometheus|grafana|telegraf)"
action: drop
The full remote_write config would look like this:
remote_write:
- url: <PROMETHEUS WRITE ENDPOINT>
queue_config:
max_samples_per_send: 1000
write_relabel_configs:
- source_labels: [job]
regex: "(loki|prometheus|grafana|telegraf)"
action: drop
headers:
key: <API KEY>
x-tenant-id: <ANOMIFY ORG ID>
x-server-id: 1
x-server-url: <YOUR_PROMETHEUS_URL>
dropLabels: "[['monitor','master']]"
metadata_config:
send: true
send_interval: 1m
max_samples_per_send: 1000
Only the metrics WITHOUT the job label set to loki|prometheus|grafana|telegraf would be sent to Anomify.
Example 2: Keep Metrics With Label
Conversely, if you only want to send metrics that have the label job with the value of loki then you could define a keep
action:
The full remote_write config would look like this:
remote_write:
- url: <PROMETHEUS WRITE ENDPOINT>
queue_config:
max_samples_per_send: 1000
write_relabel_configs:
- source_labels: [job]
regex: "(loki)"
action: keep
headers:
key: <API KEY>
x-tenant-id: <ANOMIFY ORG ID>
x-server-id: 1
x-server-url: <YOUR_PROMETHEUS_URL>
dropLabels: "[['monitor','master']]"
metadata_config:
send: true
send_interval: 1m
max_samples_per_send: 1000
Only metrics with the job value of loki
would be sent to Anomify.
You may find that the following drops a lot of boolean and string metrics and
internal go _
type metrics which can reduce metrics counts drastically in many
cases.