keyvisualizer: Pre-aggregate boundaries propagated to the collectors #96740

zachlite · 2023-02-07T20:04:25Z

Currently in the SpanStatsConsumer, the boundaries sent to the collector are taken from the system tenant's ranges from [Min, Max). This is implemented here.

More than 10,000 boundaries should be pre-aggregated so the cost of propagating and storing them stays negligible.

A possible implementation could be as follows:
As an example, If there's 100,000 ranges, every 10 ranges would be combined into one.

[a, b), [b,c), ... [i, j)  combines to [a, j)
[k, l), [m, n), ... [s, t) combines to [k, t)

etc

So the boundaries communicated to the collector would be [a, j), [k, t), ... etc

Jira issue: CRDB-24304

The text was updated successfully, but these errors were encountered:

Previously, there was no bound on the number of ranges that could be propagated to the collectors. After collection, data was downsampled using a simple heurstic to decide if a bucket was worth keeping or if it should be aggregated with its neighbor. In this commit, I've introduced a function, `maybeAggregateBoundaries`, to prevent more than `keyvisualizer.max_buckets` from being propagated to collectors. This pre-aggregation takes the place of the post-collection downsampling. For the first stable release of the key visualizer, I am intentionally sacrificing dynamic resolution and prioritizing boundary stability instead. This trade-off means that the key visualizer will demand less network, memory, and storage resources from the cluster while operating. Additionally, this PR drops the sample retention time from 14 days to 7 days, and ensures that `keyvisualizer.max_buckets` is bounded between [1, 1024]. Resolves: cockroachdb#96740 Epic: None Release note: None

96811: loqrecovery: support mixed version recovery r=erikgrinaker a=aliher1911 This commit adds mixed version support for half-online loss of quorum recovery service and cli tools. This change would allow user to use loq recovery in partially upgraded clusters by tracking version that generated data and produce recovery plans which will have identical version so that versions could be verified on all steps of recovery. General rule is you can use data from the cluster that is not newer than a binary version to avoid new information being dropped. This rule applies to planning process where planner should understand replica info and also to cockroach node that applies the plan, which should be created by equal or lower version. Additional restriction is on planner to preserve version in the plan and don't use any new features if processed info is older than the binary version. This is no different on what version gates do in cockroach. Release note: None Fixes #95344 98707: keyvisualizer: pre-aggregate ranges r=zachlite a=zachlite Previously, there was no bound on the number of ranges that could be propagated to the collectors. After collection, data was downsampled using a simple heurstic to decide if a bucket was worth keeping or if it should be aggregated with its neighbor. In this commit, I've introduced a function, `maybeAggregateBoundaries`, to prevent more than `keyvisualizer.max_buckets` from being propagated to collectors. This pre-aggregation takes the place of the post-collection downsampling. For the first stable release of the key visualizer, I am intentionally sacrificing dynamic resolution and prioritizing boundary stability instead. This trade-off means that the key visualizer will demand less network, memory, and storage resources from the cluster while operating. Additionally, this PR drops the sample retention time from 14 days to 7 days, and ensures that `keyvisualizer.max_buckets` is bounded between [1, 1024]. Resolves: #96740 Epic: None Release note: None 98713: sql,kv: bubble up retry errors when creating leaf transactions r=arulajmani a=arulajmani Previously, if we detected that the transaction was aborted when trying to construct leaf transaction state, we would handle the retry error instead of bubbling it up to the caller. When a transaction is aborted, the `TransactionRetryWithProtoRefreshError` carries with it a new transaction that should be used for subsequent attempts. Handling the retry error entailed swapping out the old `TxnCoordSender` with a new one -- one that is associated with this new transaction. This is bug prone when trying to create multiple leaf transactions in parallel if the root has been aborted. We would expect the first leaf transaction to handle the error and all subsequent leaf transactions to point to the new transaction, as the `TxnCoordSender` has been swapped out. This wasn't an issue before as we never really created multiple leaf transactions in parallel. This recently change in 0f4b431, which started parallelizing FK and uniqueness checks. With this change, we could see FK or uniqueness violations when in fact the transaction needed to be retried. This patch fixes the issue described above by not handling the retry error when creating leaf transactions. Instead, we expect the ConnExecutor to retry the entire transaction and prepare it for another iteration. Fixes #97141 Epic: none Release note: None 98732: cloud/gcp_test: add weird code 0/ok error to regex r=dt a=dt Still unsure why we sometimes see this instead of the other more infromative errors but in the meanime, make the test pass. Release note: none. Epic: none. Co-authored-by: Oleg Afanasyev <[email protected]> Co-authored-by: zachlite <[email protected]> Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: David Taylor <[email protected]>

zachlite added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-observability labels Feb 7, 2023

blathers-crl bot added the A-kv-observability label Feb 7, 2023

zachlite mentioned this issue Feb 7, 2023

keyvisualizer: Tracking issue for various improvements #96737

Closed

6 tasks

zachlite mentioned this issue Mar 15, 2023

keyvisualizer: pre-aggregate ranges #98707

Merged

craig bot closed this as completed in f8ba475 Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keyvisualizer: Pre-aggregate boundaries propagated to the collectors #96740

keyvisualizer: Pre-aggregate boundaries propagated to the collectors #96740

zachlite commented Feb 7, 2023 •

edited

Loading

keyvisualizer: Pre-aggregate boundaries propagated to the collectors #96740

keyvisualizer: Pre-aggregate boundaries propagated to the collectors #96740

Comments

zachlite commented Feb 7, 2023 • edited Loading

zachlite commented Feb 7, 2023 •

edited

Loading