opentelemetry and Jaeger (experimental)
With the use of a combination of opentelemetry related tools and Jaeger you can send traces to Anomify. Anomify will convert traces to metrics relating to the traces themselves.
You can configure an app or apps to use the opentelemetry.exporter.jaeger.thrift JaegerExporter
to send trace data to opentelemetry collector (otelcol).
Anomify can ingest opentelemetry OTLP trace data converting service/method timings and trace counts into aggregated metrics. With a view to handle OTLP metrics and perhaps logging as well in the future.
Should you wish to experiment with trace to metric conversions in your own app/s, first assess the total number of Service and Operation counts in Jaegar. And be very aware that instrumentation telemetry has the ability change to high cardinality if something in your application design changes.
Although your app can send traces directly to Jaeger to send trace data to Anomify opentelemetry collector must be used as a router to send trace data to both Jaeger and Anomify.
On receiving trace data Anomify will automatically aggregate traces and record the trace timings and trace counts per method as metrics. This allows for the identification of significant changes in either the time taken for a method/s to complete and the number of traces in a method, e.g. the number of methods called by a method.
For example let us say the app/redis/keys method makes on average 8 GET Redis method calls and a change is introduced to determine the size of each key returned and this results in the method making 116 GET Redis method calls, Anomify will trigger an anomaly and alert on that change. The same is true for method timings.
The objective here being to monitor the behaviour of the methods in the traces rather than the frequency of the methods.
It must be stated that this suited to situations where traces fairly frequently, at least a few times a day. If trace data is sent less than once per day this analysis method will not detect changes those methods as there is insufficient data.
Versions
Anomify only supports the following version of opentelemetry related libraries, while these are Python based, other language implementations and instrumentation use to the same versioning, so any in this version range can be used.
- opentelemetry-api >= 1.10.0 <= 1.15.0
- opentelemetry-distro >= 0.29b0 <= 0.36b0
- opentelemetry-exporter-jaeger >= 1.10.0 <= 1.15.0
- opentelemetry-exporter-jaeger-proto-grpc >= 1.10.0 <= 1.15.0
- opentelemetry-exporter-jaeger-thrift >= 1.10.0 <= 1.15.0
- opentelemetry-instrumentation >= 0.29b0 <= 0.36b0
- opentelemetry-proto >= 1.10.0 <= 1.15.0
- opentelemetry-sdk >= 1.10.0 <= 1.15.0
- opentelemetry-semantic-conventions >= 0.29b0 <= 0.36b0
- opentelemetry-util-http >= 0.29b0 <= 0.36b0
At times opentelemetry can make significant breaking changes between one version and the next, like changing the format of a span. Anomify therefore only support one version and may need to implement breaking changes due to upstream requires in order to upgrade.
Configuration
To achieve the following you need Jaeger and otelcol running, although if you have your own otelcol or Jaeger set up you can configure this as it suits your set up.
Importantly you must ensure that the the
OTEL_EXPORTER_JAEGER_AGENT_SPLIT_OVERSIZED_BATCHES
ENV variable is set to
prevent the opentelemetry.exporter.jaeger.thrift JaegerExporter from warning
that Data exceeds the max UDP packet size and losing data on large traces.
Therefore ensure that environment variable is set for the app that is sending trace data to Jaegar has the ENV variable set.
If the app is started by systemd add the following to the systemd file:
Or just ensuring that the ENV variable is set and accessible to your app:
The general installation of otelcol and Jaeger is not covered here see the respective documentation for each: https://www.jaegertracing.io/docs https://opentelemetry.io/docs/collector/
Anomify specific configurations for otelcol and Jaeger are as follows.
This is an example /etc/otelcol/config.yaml for otelcol to achieve the described set up above.
# This is an example /etc/otelcol/config.yaml for a set up where otelcol and
# Jaeger are running on the same machine. This config is mostly the default
# config.yaml but it will send traces to Jaeger and Anomify via a otlphttp
# exporter. Unnecessary protocols have been commented out.
# This config assumes that you have the following set up on the same instance.
# - A default Jaeger all-in-one instance running running in memory mode on the
# same instance, but it can be any type of Jaeger e.g.
# /opt/jaeger/jaeger-1.32.0-linux-amd64/jaeger-all-in-one --collector.zipkin.host-port=:9411
# - otelcol installed on the same instance
extensions:
health_check:
pprof:
endpoint: 0.0.0.0:1777
zpages:
endpoint: 0.0.0.0:55679
receivers:
otlp:
protocols:
grpc:
http:
# opencensus:
# Collect own metrics
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8888']
jaeger:
protocols:
# grpc:
#thrift_binary:
# endpoint: '127.0.0.1:26832'
thrift_compact:
endpoint: '127.0.0.1:26831'
# thrift_http:
# zipkin:
processors:
memory_limiter:
check_interval: 1s
limit_mib: 500
spike_limit_mib: 250
batch:
send_batch_size: 1000
timeout: 30s
exporters:
logging:
logLevel: debug
# Data sources: traces
jaeger:
# endpoint: "jaeger-all-in-one:14250"
#endpoint: '127.0.0.1:6831'
endpoint: '127.0.0.1:14250'
tls:
insecure: true
# This otlphttp exported sends to Anomify
otlphttp:
traces_endpoint: 'https://<YOUR_ANOMIFY_HOST>/flux/otel/trace/v1'
tls:
insecure: true
compression: gzip
headers:
otlp: true
key: <ANOMIFY_API_KEY>
service:
pipelines:
traces:
# receivers: [otlp, opencensus, jaeger, zipkin]
receivers:
- otlp
- jaeger
processors:
- memory_limiter
- batch
#exporters: [logging, jaeger]
exporters:
- jaeger
- otlphttp
metrics:
receivers: [otlp]
processors: [batch]
exporters: [logging]
extensions: [health_check, pprof, zpages]
The configuration of Jaeger is not specific.