Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubernetes manifests for quick deployment #654

Merged
merged 3 commits into from
Apr 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Maximum trace limit reached
weight: 474
---

# I am seeing the error: max live traces per tenant exceeded
Expand Down
14 changes: 14 additions & 0 deletions docs/tempo/website/troubleshooting/too-many-jobs-in-queue.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ weight: 473

# I am getting error message ‘Too many jobs in the queue’

The error message might also be
- `queue doesn't have room for 100 jobs`
- `failed to add a job to work queue`

You may see this error if the compactor isn’t running and the blocklist size has exploded.
Possible reasons why the compactor may not be running are:

Expand All @@ -26,3 +30,13 @@ If this metric is greater than zero (0), check the logs of the compactor for an
- `max_block_bytes` to determine when the ingester cuts blocks. A good number is anywhere from 100MB to 2GB depending on the workload.
- `max_compaction_objects` to determine the max number of objects in a compacted block. This should relatively high, generally in the millions.
- `retention_duration` for how long traces should be retained in the backend.
- Check the storage section of the config and increase `queue_depth`. Do bear in mind that a deeper queue could mean longer
waiting times for query responses. Adjust `max_workers` accordingly, which configures the number of parallel workers
that query backend blocks.
```
storage:
trace:
pool:
max_workers: 100 # the worker pool mainly drives querying, but is also used for polling the blocklist
queue_depth: 10000
```
58 changes: 58 additions & 0 deletions operations/kube-manifests/ConfigMap-tempo-compactor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
apiVersion: v1
data:
overrides.yaml: |
overrides: {}
tempo.yaml: |
compactor:
compaction:
block_retention: 144h
chunk_size_bytes: 1.048576e+07
ring:
kvstore:
store: memberlist
distributor:
receivers:
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:55680
ingester:
lifecycler:
ring:
replication_factor: 3
memberlist:
abort_if_cluster_join_fails: false
bind_port: 7946
join_members:
- gossip-ring.tracing.svc.cluster.local:7946
overrides:
per_tenant_override_config: /conf/overrides.yaml
server:
http_listen_port: 3100
storage:
trace:
backend: gcs
blocklist_poll: 10m
cache: memcached
gcs:
bucket_name: tempo
chunk_buffer_size: 1.048576e+07
memcached:
consistent_hash: true
host: memcached
service: memcached-client
timeout: 500ms
pool:
queue_depth: 2000
s3:
bucket: tempo
wal:
path: /var/tempo/wal
kind: ConfigMap
metadata:
name: tempo-compactor
namespace: tracing
55 changes: 55 additions & 0 deletions operations/kube-manifests/ConfigMap-tempo-querier.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
apiVersion: v1
data:
tempo.yaml: |
compactor: null
distributor:
receivers:
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:55680
ingester:
lifecycler:
ring:
replication_factor: 3
memberlist:
abort_if_cluster_join_fails: false
bind_port: 7946
join_members:
- gossip-ring.tracing.svc.cluster.local:7946
overrides:
per_tenant_override_config: /conf/overrides.yaml
querier:
frontend_worker:
frontend_address: query-frontend-discovery.tracing.svc.cluster.local:9095
server:
http_listen_port: 3100
log_level: debug
storage:
trace:
backend: gcs
blocklist_poll: 5m
cache: memcached
gcs:
bucket_name: tempo
chunk_buffer_size: 1.048576e+07
memcached:
consistent_hash: true
host: memcached
service: memcached-client
timeout: 1s
pool:
max_workers: 200
queue_depth: 2000
s3:
bucket: tempo
wal:
path: /var/tempo/wal
kind: ConfigMap
metadata:
name: tempo-querier
namespace: tracing
50 changes: 50 additions & 0 deletions operations/kube-manifests/ConfigMap-tempo-query-frontend.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
apiVersion: v1
data:
tempo.yaml: |
compactor: null
distributor:
receivers:
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:55680
ingester:
lifecycler:
ring:
replication_factor: 3
memberlist:
abort_if_cluster_join_fails: false
bind_port: 7946
join_members:
- gossip-ring.tracing.svc.cluster.local:7946
overrides:
per_tenant_override_config: /conf/overrides.yaml
server:
http_listen_port: 3100
storage:
trace:
backend: gcs
blocklist_poll: "0"
cache: memcached
gcs:
bucket_name: tempo
chunk_buffer_size: 1.048576e+07
memcached:
consistent_hash: true
host: memcached
service: memcached-client
timeout: 500ms
pool:
queue_depth: 2000
s3:
bucket: tempo
wal:
path: /var/tempo/wal
kind: ConfigMap
metadata:
name: tempo-query-frontend
namespace: tracing
8 changes: 8 additions & 0 deletions operations/kube-manifests/ConfigMap-tempo-query.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: v1
data:
tempo-query.yaml: |
backend: localhost:3100
kind: ConfigMap
metadata:
name: tempo-query
namespace: tracing
52 changes: 52 additions & 0 deletions operations/kube-manifests/ConfigMap-tempo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
apiVersion: v1
data:
overrides.yaml: |
overrides: {}
tempo.yaml: |
compactor: null
distributor:
receivers:
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:55680
ingester:
lifecycler:
ring:
replication_factor: 3
memberlist:
abort_if_cluster_join_fails: false
bind_port: 7946
join_members:
- gossip-ring.tracing.svc.cluster.local:7946
overrides:
per_tenant_override_config: /conf/overrides.yaml
server:
http_listen_port: 3100
storage:
trace:
backend: gcs
blocklist_poll: "0"
cache: memcached
gcs:
bucket_name: tempo
chunk_buffer_size: 1.048576e+07
memcached:
consistent_hash: true
host: memcached
service: memcached-client
timeout: 500ms
pool:
queue_depth: 2000
s3:
bucket: tempo
wal:
path: /var/tempo/wal
kind: ConfigMap
metadata:
name: tempo
namespace: tracing
49 changes: 49 additions & 0 deletions operations/kube-manifests/Deployment-compactor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: compactor
namespace: tracing
spec:
minReadySeconds: 10
replicas: 5
revisionHistoryLimit: 10
selector:
matchLabels:
app: compactor
name: compactor
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
template:
metadata:
annotations:
config_hash: 3bc06ce225fc97d13d89fbee2dd6ea25
labels:
app: compactor
name: compactor
spec:
containers:
- args:
- -target=compactor
- -config.file=/conf/tempo.yaml
- -mem-ballast-size-mbs=1024
image: grafana/tempo:latest
imagePullPolicy: IfNotPresent
name: compactor
ports:
- containerPort: 3100
name: prom-metrics
readinessProbe:
httpGet:
path: /ready
port: 3100
initialDelaySeconds: 15
timeoutSeconds: 1
volumeMounts:
- mountPath: /conf
name: tempo-conf
volumes:
- configMap:
name: tempo-compactor
name: tempo-conf
52 changes: 52 additions & 0 deletions operations/kube-manifests/Deployment-distributor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: distributor
namespace: tracing
spec:
minReadySeconds: 10
replicas: 5
revisionHistoryLimit: 10
selector:
matchLabels:
app: distributor
name: distributor
tempo-gossip-member: "true"
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
template:
metadata:
annotations:
config_hash: 76b3caf721a80349e206778e56d41a66
labels:
app: distributor
name: distributor
tempo-gossip-member: "true"
spec:
containers:
- args:
- -target=distributor
- -config.file=/conf/tempo.yaml
- -mem-ballast-size-mbs=1024
image: grafana/tempo:latest
imagePullPolicy: IfNotPresent
name: distributor
ports:
- containerPort: 3100
name: prom-metrics
readinessProbe:
httpGet:
path: /ready
port: 3100
initialDelaySeconds: 15
timeoutSeconds: 1
volumeMounts:
- mountPath: /conf
name: tempo-conf
terminationGracePeriodSeconds: 60
volumes:
- configMap:
name: tempo
name: tempo-conf
Loading