Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempo 2.0: Config Cleanup #1978

Merged
merged 19 commits into from
Jan 11, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,58 @@ Old config will still work but will be removed in a future release. [#1735](http
* Upgrade `github.com/grafana/dskit`
* Upgrade `github.com/grafana/e2e`
* Upgrade `github.com/minio/minio-go/v7`
* [CHANGE] Config updates to prepare for Tempo 2.0. [#1978](https://github.com/grafana/tempo/pull/1978) (@joe-elliott)
Defaults updated:
```
query_frontend:
max_oustanding_per_tenant: 2000
search:
concurrent_jobs: 1000
target_bytes_per_job: 104857600
max_duration: 168h
query_ingesters_until: 30m
trace_by_id:
query_shards: 50
querier:
max_concurrent_queries: 20
search:
prefer_self: 10
ingester:
concurrent_flushes: 4
max_block_duration: 30m
max_block_bytes: 524288000
storage:
trace:
pool:
max_workers: 400
queue_depth: 20000
search:
read_buffer_count: 32
read_buffer_size_bytes: 1048576
```
**BREAKING CHANGE** Renamed/removed/moved
```
query_frontend:
query_shards: // removed. use trace_by_id.query_shards
querier:
query_timeout: // removed. use trace_by_id.query_timeout
compactor:
compaction:
chunk_size_bytes: // renamed to v2_in_buffer_bytes
flush_size_bytes: // renamed to v2_out_buffer_bytes
iterator_buffer_size: // renamed to v2_prefetch_traces_count
ingester:
use_flatbuffer_search: // removed. automatically set based on block type
storage:
wal:
encoding: // renamed to v2_encoding
version: // removed and pinned to block.version
block:
index_downsample_bytes: // renamed to v2_index_downsample_bytes
index_page_size_bytes: // renamed to v2_index_page_size_bytes
encoding: // renamed to v2_encoding
row_group_size_bytes: // renamed to parquet_row_group_size_bytes
```
* [FEATURE] Add capability to configure the used S3 Storage Class [#1697](https://github.com/grafana/tempo/pull/1714) (@amitsetty)
* [ENHANCEMENT] cache: expose username and sentinel_username redis configuration options for ACL-based Redis Auth support [#1708](https://github.com/grafana/tempo/pull/1708) (@jsievenpiper)
* [ENHANCEMENT] metrics-generator: expose span size as a metric [#1662](https://github.com/grafana/tempo/pull/1662) (@ie-pham)
Expand Down
19 changes: 0 additions & 19 deletions cmd/tempo/app/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ import (
"github.com/grafana/tempo/pkg/usagestats"
"github.com/grafana/tempo/pkg/util"
"github.com/grafana/tempo/tempodb"
v2 "github.com/grafana/tempo/tempodb/encoding/v2"
"github.com/prometheus/client_golang/prometheus"
"github.com/weaveworks/common/server"
)
Expand Down Expand Up @@ -156,16 +155,6 @@ func (c *Config) CheckConfig() []ConfigWarning {
warnings = append(warnings, warnStorageTraceBackendLocal)
}

// flatbuffers are configured but we're not using v2
if c.Ingester.UseFlatbufferSearch && c.StorageConfig.Trace.Block.Version != v2.VersionString {
warnings = append(warnings, warnFlatBuffersNotNecessary)
}

// we're using v2 but flatbuffers are not configured
if !c.Ingester.UseFlatbufferSearch && c.StorageConfig.Trace.Block.Version == v2.VersionString {
warnings = append(warnings, warnIngesterSearchWillNotWork)
}

return warnings
}

Expand Down Expand Up @@ -235,12 +224,4 @@ var (
warnStorageTraceBackendLocal = ConfigWarning{
Message: "Local backend will not correctly retrieve traces with a distributed deployment unless all components have access to the same disk. You should probably be using object storage as a backend.",
}
warnFlatBuffersNotNecessary = ConfigWarning{
Message: "Flatbuffers enabled with a block type that supports search.",
Explain: "The configured block type supports local search in the ingester. Flatbuffers are not necessary and will consume extra resources.",
}
warnIngesterSearchWillNotWork = ConfigWarning{
Message: "Flatbuffers disabled with a block type that does not support search",
Explain: "Flatbuffers are disabled but the configured block type does not support ingester search. This can be ignored if only trace by id lookup is desired.",
}
)
18 changes: 0 additions & 18 deletions cmd/tempo/app/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,24 +62,6 @@ func TestConfig_CheckConfig(t *testing.T) {
}(),
expect: []ConfigWarning{warnStorageTraceBackendLocal},
},
{
name: "warn ingester search",
config: func() *Config {
cfg := newDefaultConfig()
cfg.StorageConfig.Trace.Block.Version = "v2"
return cfg
}(),
expect: []ConfigWarning{warnIngesterSearchWillNotWork},
},
{
name: "warn flatbuffers not necessary",
config: func() *Config {
cfg := newDefaultConfig()
cfg.Ingester.UseFlatbufferSearch = true
return cfg
}(),
expect: []ConfigWarning{warnFlatBuffersNotNecessary},
},
}

for _, tc := range tt {
Expand Down
4 changes: 4 additions & 0 deletions cmd/tempo/app/modules.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ import (
"github.com/grafana/tempo/tempodb/backend/gcs"
"github.com/grafana/tempo/tempodb/backend/local"
"github.com/grafana/tempo/tempodb/backend/s3"
v2 "github.com/grafana/tempo/tempodb/encoding/v2"
)

// The various modules that make up tempo.
Expand Down Expand Up @@ -145,6 +146,9 @@ func (t *App) initDistributor() (services.Service, error) {
}

func (t *App) initIngester() (services.Service, error) {
// always use flatbuffer search if we're using the v2 blocks. todo: in 2.1 remove flatbuffer search altogether
t.cfg.Ingester.UseFlatbufferSearch = (t.cfg.StorageConfig.Trace.Block.Version == v2.VersionString)

t.cfg.Ingester.LifecyclerConfig.ListenPort = t.cfg.Server.GRPCListenPort
ingester, err := ingester.New(t.cfg.Ingester, t.store, t.overrides, prometheus.DefaultRegisterer)
if err != nil {
Expand Down
74 changes: 33 additions & 41 deletions docs/tempo/website/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,22 +213,16 @@ ingester:
[flush_check_period: <duration>]

# maximum size of a block before cutting it
# (default: 1073741824 = 1GB)
# (default: 524288000 = 500MB)
[max_block_bytes: <int>]

# maximum length of time before cutting a block
# (default: 1h)
# (default: 30m)
[max_block_duration: <duration>]

# duration to keep blocks in the ingester after they have been flushed
# (default: 15m)
[ complete_block_timeout: <duration>]

# If true then flatbuffer search metadata files are created and used in the ingester for search,
# search tags and search tag values. If false then the blocks themselves are used for search in the ingesters.
# Warning: v2 blocks do not support ingester search without this enabled.
# (default: false)
[ use_flatbuffer_search: <bool> ]
```

## Metrics-generator
Expand Down Expand Up @@ -361,10 +355,6 @@ query_frontend:
# (default: 2)
[max_retries: <int>]

# The number of shards to split a trace by id query into.
# (default: 20)
[query_shards: <int>]

# number of block queries that are tolerated to error before considering the entire query as failed
# numbers greater than 0 make possible for a read to return partial results
# (default: 0)
Expand All @@ -373,11 +363,11 @@ query_frontend:
search:

# The number of concurrent jobs to execute when searching the backend.
# (default: 50)
# (default: 1000)
[concurrent_jobs: <int>]

# The target number of bytes for each job to handle when performing a backend search.
# (default: 10485760)
# (default: 104857600)
[target_bytes_per_job: <int>]

# Limit used for search requests if none is set by the caller
Expand All @@ -392,7 +382,7 @@ query_frontend:

# The maximum allowed time range for a search.
# 0 disables this limit.
# (default: 1h1m0s)
# (default: 168h)
[max_duration: <duration>]

# query_backend_after and query_ingesters_until together control where the query-frontend searches for traces.
Expand All @@ -403,11 +393,14 @@ query_frontend:
# (default: 15m)
[query_backend_after: <duration>]

# (default: 1h)
# (default: 30m)
[query_ingesters_until: <duration>]

# Trace by ID lookup configuration
trace_by_id:
# The number of shards to split a trace by id query into.
# (default: 50)
[query_shards: <int>]

# If set to a non-zero value, a second request will be issued at the provided duration.
# Recommended to be set to p99 of search requests to reduce long-tail latency.
Expand All @@ -428,14 +421,11 @@ The Querier is responsible for querying the backends/cache for the traceID.
# querier config block
querier:

# Timeout for trace lookup requests
[query_timeout: <duration> | default = 10s]

# The query frontend turns both trace by id (/api/traces/<id>) and search (/api/search?<params>) requests
# into subqueries that are then pulled and serviced by the queriers.
# This value controls the overall number of simultaneous subqueries that the querier will service at once. It does
# not distinguish between the types of queries.
[max_concurrent_queries: <int> | default = 5]
[max_concurrent_queries: <int> | default = 20]

# The query frontend sents sharded requests to ingesters and querier (/api/traces/<id>)
# By default, all healthy ingesters are queried for the trace id.
Expand All @@ -444,6 +434,10 @@ querier:
# If this parameter is set, the number of 404s could increase during rollout or scaling of ingesters.
[query_relevant_ingesters: <bool> | default = false]

trace_by_id:
# Timeout for trace lookup requests
[query_timeout: <duration> | default = 10s]

search:
# Timeout for search requests
[query_timeout: <duration> | default = 30s]
Expand All @@ -459,7 +453,7 @@ querier:
# number of subqueries. In the default case of 2 the querier will process up to 2 search requests subqueries before starting
# to reach out to search_external_endpoints.
# Setting this to 0 will disable this feature and the querier will proxy all search subqueries to search_external_endpoints.
[prefer_self: <int> | default = 2 ]
[prefer_self: <int> | default = 10 ]

# If set to a non-zero value a second request will be issued at the provided duration. Recommended to
# be set to p99 of external search requests to reduce long tail latency.
Expand Down Expand Up @@ -514,12 +508,6 @@ compactor:
# Optional. Blocks in this time window will be compacted together. Default is 1h.
[compaction_window: <duration>]

# Optional. Amount of data to buffer from input blocks. Default is 5 MiB.
[chunk_size_bytes: <int>]

# Optional. Flush data to backend when buffer is this large. Default is 30 MB.
[flush_size_bytes: <int>]

# Optional. Maximum number of traces in a compacted block. Default is 6 million.
# WARNING: Deprecated. Use max_block_bytes instead.
[max_compaction_objects: <int>]
Expand All @@ -530,16 +518,21 @@ compactor:
# Optional. Number of tenants to process in parallel during retention. Default is 10.
[retention_concurrency: <int>]

# Optional. Number of traces to buffer in memory during compaction. Increasing may improve performance but will also increase memory usage. Default is 1000.
[iterator_buffer_size: <int>]

# Optional. The maximum amount of time to spend compacting a single tenant before moving to the next. Default is 5m.
[max_time_per_tenant: <duration>]

# Optional. The time between compaction cycles. Default is 30s.
# Note: The default will be used if the value is set to 0.
[compaction_cycle: <duration>]

# Optional. Amount of data to buffer from input blocks. Default is 5 MiB.
[v2_in_buffer_bytes: <int>]

# Optional. Flush data to backend when buffer is this large. Default is 30 MB.
[v2_out_buffer_bytes: <int>]

# Optional. Number of traces to buffer in memory during compaction. Increasing may improve performance but will also increase memory usage. Default is 1000.
[v2_prefetch_traces_count: <int>]
```

## Storage
Expand Down Expand Up @@ -786,12 +779,12 @@ storage:

# Size of read buffers used when performing search on a vparquet block. This value times the read_buffer_count
# is the total amount of bytes used for buffering when performing search on a parquet block.
# Default: 4194304
# Default: 1048576
[read_buffer_size_bytes: <int>]

# Number of read buffers used when performing search on a vparquet block. This value times the read_buffer_size_bytes
# is the total amount of bytes used for buffering when performing search on a parquet block.
# Default: 8
# Default: 32
[read_buffer_count: <int>]

# Granular cache control settings for parquet metadata objects
Expand Down Expand Up @@ -923,11 +916,11 @@ storage:
# the worker pool is used primarily when finding traces by id, but is also used by other
pool:

# total number of workers pulling jobs from the queue (default: 50)
# total number of workers pulling jobs from the queue (default: 400)
[max_workers: <int>]

# length of job queue. imporatant for querier as it queues a job for every block it has to search
# (default: 10000)
# (default: 20000)
[queue_depth: <int>]

# Configuration block for the Write Ahead Log (WAL)
Expand All @@ -939,7 +932,7 @@ storage:

# wal encoding/compression.
# options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
[encoding: <string> | default = snappy]
[v2_encoding: <string> | default = snappy]

# Defines the search data encoding/compression protocol.
# Options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
Expand All @@ -957,6 +950,8 @@ storage:

# block configuration
block:
# block format version. options: v2, vParquet
[version: <string> | default = vParquet]

# bloom filter false positive rate. lower values create larger filters but fewer false positives
[bloom_filter_false_positive: <float> | default = 0.01]
Expand All @@ -965,13 +960,10 @@ storage:
[bloom_filter_shard_size_bytes: <int> | default = 100KiB]

# number of bytes per index record
[index_downsample_bytes: <uint64> | default = 1MiB]

# block format version. options: v2, vParquet
[version: <string> | default = v2]
[v2_index_downsample_bytes: <uint64> | default = 1MiB]

# block encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
[encoding: <string> | default = zstd]
[v2_encoding: <string> | default = zstd]

# search data encoding/compression. same options as block encoding.
[search_encoding: <string> | default = snappy]
Expand All @@ -982,7 +974,7 @@ storage:
# an estimate of the number of bytes per row group when cutting Parquet blocks. lower values will
# create larger footers but will be harder to shard when searching. It is difficult to calculate
# this field directly and it may vary based on workload. This is roughly a lower bound.
[row_group_size_bytes: <int> | default = 100MB]
[parquet_row_group_size_bytes: <int> | default = 100MB]
```

## Memberlist
Expand Down
Loading