Skip to content

opentelemetry and Jaeger (experimental)

open-telemetry

With the use of a combination of opentelemetry related tools and Jaeger you can send traces to Anomify. Anomify will convert traces to metrics relating to the traces themselves.

You can configure an app or apps to use the opentelemetry.exporter.jaeger.thrift JaegerExporter to send trace data to opentelemetry collector (otelcol).

Anomify can ingest opentelemetry OTLP trace data converting service/method timings and trace counts into aggregated metrics. With a view to handle OTLP metrics and perhaps logging as well in the future.

Should you wish to experiment with trace to metric conversions in your own app/s, first assess the total number of Service and Operation counts in Jaegar. And be very aware that instrumentation telemetry has the ability change to high cardinality if something in your application design changes.

Although your app can send traces directly to Jaeger to send trace data to Anomify opentelemetry collector must be used as a router to send trace data to both Jaeger and Anomify.

App -> opentelemetry JaegerExporter -> otelcol |-> Jaeger
                                               |-> Anomify

On receiving trace data Anomify will automatically aggregate traces and record the trace timings and trace counts per method as metrics. This allows for the identification of significant changes in either the time taken for a method/s to complete and the number of traces in a method, e.g. the number of methods called by a method.

For example let us say the app/redis/keys method makes on average 8 GET Redis method calls and a change is introduced to determine the size of each key returned and this results in the method making 116 GET Redis method calls, Anomify will trigger an anomaly and alert on that change. The same is true for method timings.

The objective here being to monitor the behaviour of the methods in the traces rather than the frequency of the methods.

It must be stated that this suited to situations where traces fairly frequently, at least a few times a day. If trace data is sent less than once per day this analysis method will not detect changes those methods as there is insufficient data.

Versions

Anomify only supports the following version of opentelemetry related libraries, while these are Python based, other language implementations and instrumentation use to the same versioning, so any in this version range can be used.

  • opentelemetry-api >= 1.10.0 <= 1.15.0
  • opentelemetry-distro >= 0.29b0 <= 0.36b0
  • opentelemetry-exporter-jaeger >= 1.10.0 <= 1.15.0
  • opentelemetry-exporter-jaeger-proto-grpc >= 1.10.0 <= 1.15.0
  • opentelemetry-exporter-jaeger-thrift >= 1.10.0 <= 1.15.0
  • opentelemetry-instrumentation >= 0.29b0 <= 0.36b0
  • opentelemetry-proto >= 1.10.0 <= 1.15.0
  • opentelemetry-sdk >= 1.10.0 <= 1.15.0
  • opentelemetry-semantic-conventions >= 0.29b0 <= 0.36b0
  • opentelemetry-util-http >= 0.29b0 <= 0.36b0

At times opentelemetry can make significant breaking changes between one version and the next, like changing the format of a span. Anomify therefore only support one version and may need to implement breaking changes due to upstream requires in order to upgrade.

Configuration

To achieve the following you need Jaeger and otelcol running, although if you have your own otelcol or Jaeger set up you can configure this as it suits your set up.

Importantly you must ensure that the the OTEL_EXPORTER_JAEGER_AGENT_SPLIT_OVERSIZED_BATCHES ENV variable is set to prevent the opentelemetry.exporter.jaeger.thrift JaegerExporter from warning that Data exceeds the max UDP packet size and losing data on large traces.

Therefore ensure that environment variable is set for the app that is sending trace data to Jaegar has the ENV variable set.

If the app is started by systemd add the following to the systemd file:

Environment=OTEL_EXPORTER_JAEGER_AGENT_SPLIT_OVERSIZED_BATCHES=True

Or just ensuring that the ENV variable is set and accessible to your app:

export OTEL_EXPORTER_JAEGER_AGENT_SPLIT_OVERSIZED_BATCHES="True"

The general installation of otelcol and Jaeger is not covered here see the respective documentation for each: https://www.jaegertracing.io/docs https://opentelemetry.io/docs/collector/

Anomify specific configurations for otelcol and Jaeger are as follows.

This is an example /etc/otelcol/config.yaml for otelcol to achieve the described set up above.

# This is an example /etc/otelcol/config.yaml for a set up where otelcol and
# Jaeger are running on the same machine.  This config is mostly the default
# config.yaml but it will send traces to Jaeger and Anomify via a otlphttp
# exporter.  Unnecessary protocols have been commented out.
# This config assumes that you have the following set up on the same instance.
# - A default Jaeger all-in-one instance running running in memory mode on the
#   same instance, but it can be any type of Jaeger e.g.
#   /opt/jaeger/jaeger-1.32.0-linux-amd64/jaeger-all-in-one --collector.zipkin.host-port=:9411
# - otelcol installed on the same instance
extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
      http:

  # opencensus:

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      # grpc:
      #thrift_binary:
      #  endpoint: '127.0.0.1:26832'
      thrift_compact:
        endpoint: '127.0.0.1:26831'
      # thrift_http:

  # zipkin:

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 500
    spike_limit_mib: 250
  batch:
    send_batch_size: 1000
    timeout: 30s

exporters:
  logging:
    logLevel: debug
  # Data sources: traces
  jaeger:
    # endpoint: "jaeger-all-in-one:14250"
    #endpoint: '127.0.0.1:6831'
    endpoint: '127.0.0.1:14250'
    tls:
      insecure: true
  # This otlphttp exported sends to Anomify
  otlphttp:
    traces_endpoint: 'https://<YOUR_ANOMIFY_HOST>/flux/otel/trace/v1'
    tls:
      insecure: true
    compression: gzip
    headers:
      otlp: true
      key: <ANOMIFY_API_KEY>

service:

  pipelines:

    traces:
      # receivers: [otlp, opencensus, jaeger, zipkin]
      receivers:
        - otlp
        - jaeger
      processors:
        - memory_limiter
        - batch
      #exporters: [logging, jaeger]
      exporters:
        - jaeger
        - otlphttp

    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]

  extensions: [health_check, pprof, zpages]

The configuration of Jaeger is not specific.