-
Notifications
You must be signed in to change notification settings - Fork 631
Closed as not planned
Labels
staleUsed for stale issues / PRsUsed for stale issues / PRs
Description
I am using the latest version of tempo-distributed (v2.6.1), and my data volume is approximately 1,000 records per second, with a retention period of 21 days totaling around 900 GB. When performing TraceQL queries, I’m encountering significant performance bottlenecks, especially when querying span or resource attributes.
According to this article,
https://grafana.com/docs/tempo/latest/operations/backend_search/
Here are the improvements I've implemented so far:
- Using the vParquet4 search engine and configuring dedicated_column for specific span or resource attributes.
- Enabling stream_over_http_enabled to allow Grafana to perform queries via streaming.
- Scaling out the querier by increasing replicas to 6.
- Adjusting the querier’s max_concurrent_queries and queryFrontend’s concurrent_jobs.
- Adding scope to attribute queries in TraceQL, for example:
.http.request.method = "GET" → span.http.request.method = "GET"
However, despite these adjustments, the performance is still below acceptable levels. Are there any additional optimizations I could make?
my helm chart values is as following:
tempo:
structuredConfig:
stream_over_http_enabled: true
metricsGenerator:
enabled: true
config:
storage:
remote_write:
- url: http://prometheus-server.prometheus.svc.cluster.local/api/v1/write
send_exemplars: true
ingester:
resources:
limits:
memory: 8Gi
queryFrontend:
replicas: 2
config:
max_outstanding_per_tenant: 2000
search:
concurrent_jobs: 100
target_bytes_per_job: 52428800
querier:
replicas: 6
resources:
limits:
memory: 10Gi
config:
search:
query_timeout: 60s
max_concurrent_queries: 30
compactor:
replicas: 3
config:
compaction:
block_retention: 504h
distributor:
replicas: 3
traces:
otlp:
http:
enabled: true
grpc:
enabled: true
storage:
trace:
block:
version: vParquet4
dedicated_columns:
- name: service.name
type: string
scope: resource
- name: k8s.namespace.name
type: string
scope: resource
- name: url.path
type: string
scope: span
- name: http.route
type: string
scope: span
- name: http.target
type: string
scope: span
- name: http.request.method
type: string
scope: span
- name: http.response.status_code
type: string
scope: span
- name: db.name
type: string
scope: span
- name: db.system
type: string
scope: span
- name: peer.service
type: string
scope: span
backend: s3
s3:
access_key: 'xxx'
secret_key: 'xxx'
bucket: 'tempo-bucket'
endpoint: 'minio.tenant.svc.cluster.local'
insecure: true
global_overrides:
defaults:
metrics_generator:
processors:
- service-graphs
- span-metrics
server:
http_server_read_timeout: 2m
http_server_write_timeout: 2m
grpc_server_max_recv_msg_size: 16777216
grpc_server_max_send_msg_size: 16777216
Metadata
Metadata
Assignees
Labels
staleUsed for stale issues / PRsUsed for stale issues / PRs