Skip to content

[Problem] How can I improve tempo query performance #4239

@g3david405

Description

@g3david405

I am using the latest version of tempo-distributed (v2.6.1), and my data volume is approximately 1,000 records per second, with a retention period of 21 days totaling around 900 GB. When performing TraceQL queries, I’m encountering significant performance bottlenecks, especially when querying span or resource attributes.

According to this article,
https://grafana.com/docs/tempo/latest/operations/backend_search/
Here are the improvements I've implemented so far:

  1. Using the vParquet4 search engine and configuring dedicated_column for specific span or resource attributes.
  2. Enabling stream_over_http_enabled to allow Grafana to perform queries via streaming.
  3. Scaling out the querier by increasing replicas to 6.
  4. Adjusting the querier’s max_concurrent_queries and queryFrontend’s concurrent_jobs.
  5. Adding scope to attribute queries in TraceQL, for example:
    .http.request.method = "GET" → span.http.request.method = "GET"

However, despite these adjustments, the performance is still below acceptable levels. Are there any additional optimizations I could make?

my helm chart values is as following:

tempo:
  structuredConfig:
    stream_over_http_enabled: true

metricsGenerator:
  enabled: true
  config:
    storage:
      remote_write:
        - url: http://prometheus-server.prometheus.svc.cluster.local/api/v1/write
          send_exemplars: true

ingester:
  resources:
    limits:
      memory: 8Gi

queryFrontend:
  replicas: 2
  config:
    max_outstanding_per_tenant: 2000
    search:
      concurrent_jobs: 100
      target_bytes_per_job: 52428800

querier:
  replicas: 6
  resources:
    limits:
      memory: 10Gi
  config:
    search:
      query_timeout: 60s
    max_concurrent_queries: 30

compactor:
  replicas: 3
  config:
    compaction:
      block_retention: 504h

distributor:
  replicas: 3

traces:
  otlp:
    http:
      enabled: true
    grpc:
      enabled: true

storage:
  trace:
    block:
      version: vParquet4
      dedicated_columns:
        - name: service.name
          type: string
          scope: resource
        - name: k8s.namespace.name
          type: string
          scope: resource
        - name: url.path
          type: string
          scope: span
        - name: http.route
          type: string
          scope: span
        - name: http.target
          type: string
          scope: span
        - name: http.request.method
          type: string
          scope: span
        - name: http.response.status_code
          type: string
          scope: span
        - name: db.name
          type: string
          scope: span
        - name: db.system
          type: string
          scope: span
        - name: peer.service
          type: string
          scope: span
    backend: s3
    s3:
      access_key: 'xxx'
      secret_key: 'xxx'
      bucket: 'tempo-bucket'
      endpoint: 'minio.tenant.svc.cluster.local'
      insecure: true

global_overrides:
  defaults:
    metrics_generator:
      processors:
        - service-graphs
        - span-metrics

server:
  http_server_read_timeout: 2m
  http_server_write_timeout: 2m
  grpc_server_max_recv_msg_size: 16777216
  grpc_server_max_send_msg_size: 16777216

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleUsed for stale issues / PRs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions