-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keyvisualizer: Pre-aggregate boundaries propagated to the collectors #96740
Closed
Tracked by
#96737
Labels
A-kv-observability
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
6 tasks
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Mar 15, 2023
Previously, there was no bound on the number of ranges that could be propagated to the collectors. After collection, data was downsampled using a simple heurstic to decide if a bucket was worth keeping or if it should be aggregated with its neighbor. In this commit, I've introduced a function, `maybeAggregateBoundaries`, to prevent more than `keyvisualizer.max_buckets` from being propagated to collectors. This pre-aggregation takes the place of the post-collection downsampling. For the first stable release of the key visualizer, I am intentionally sacrificing dynamic resolution and prioritizing boundary stability instead. This trade-off means that the key visualizer will demand less network, memory, and storage resources from the cluster while operating. Additionally, this PR drops the sample retention time from 14 days to 7 days, and ensures that `keyvisualizer.max_buckets` is bounded between [1, 1024]. Resolves: cockroachdb#96740 Epic: None Release note: None
zachlite
added a commit
to zachlite/cockroach
that referenced
this issue
Mar 15, 2023
Previously, there was no bound on the number of ranges that could be propagated to the collectors. After collection, data was downsampled using a simple heurstic to decide if a bucket was worth keeping or if it should be aggregated with its neighbor. In this commit, I've introduced a function, `maybeAggregateBoundaries`, to prevent more than `keyvisualizer.max_buckets` from being propagated to collectors. This pre-aggregation takes the place of the post-collection downsampling. For the first stable release of the key visualizer, I am intentionally sacrificing dynamic resolution and prioritizing boundary stability instead. This trade-off means that the key visualizer will demand less network, memory, and storage resources from the cluster while operating. Additionally, this PR drops the sample retention time from 14 days to 7 days, and ensures that `keyvisualizer.max_buckets` is bounded between [1, 1024]. Resolves: cockroachdb#96740 Epic: None Release note: None
craig bot
pushed a commit
that referenced
this issue
Mar 16, 2023
96811: loqrecovery: support mixed version recovery r=erikgrinaker a=aliher1911 This commit adds mixed version support for half-online loss of quorum recovery service and cli tools. This change would allow user to use loq recovery in partially upgraded clusters by tracking version that generated data and produce recovery plans which will have identical version so that versions could be verified on all steps of recovery. General rule is you can use data from the cluster that is not newer than a binary version to avoid new information being dropped. This rule applies to planning process where planner should understand replica info and also to cockroach node that applies the plan, which should be created by equal or lower version. Additional restriction is on planner to preserve version in the plan and don't use any new features if processed info is older than the binary version. This is no different on what version gates do in cockroach. Release note: None Fixes #95344 98707: keyvisualizer: pre-aggregate ranges r=zachlite a=zachlite Previously, there was no bound on the number of ranges that could be propagated to the collectors. After collection, data was downsampled using a simple heurstic to decide if a bucket was worth keeping or if it should be aggregated with its neighbor. In this commit, I've introduced a function, `maybeAggregateBoundaries`, to prevent more than `keyvisualizer.max_buckets` from being propagated to collectors. This pre-aggregation takes the place of the post-collection downsampling. For the first stable release of the key visualizer, I am intentionally sacrificing dynamic resolution and prioritizing boundary stability instead. This trade-off means that the key visualizer will demand less network, memory, and storage resources from the cluster while operating. Additionally, this PR drops the sample retention time from 14 days to 7 days, and ensures that `keyvisualizer.max_buckets` is bounded between [1, 1024]. Resolves: #96740 Epic: None Release note: None 98713: sql,kv: bubble up retry errors when creating leaf transactions r=arulajmani a=arulajmani Previously, if we detected that the transaction was aborted when trying to construct leaf transaction state, we would handle the retry error instead of bubbling it up to the caller. When a transaction is aborted, the `TransactionRetryWithProtoRefreshError` carries with it a new transaction that should be used for subsequent attempts. Handling the retry error entailed swapping out the old `TxnCoordSender` with a new one -- one that is associated with this new transaction. This is bug prone when trying to create multiple leaf transactions in parallel if the root has been aborted. We would expect the first leaf transaction to handle the error and all subsequent leaf transactions to point to the new transaction, as the `TxnCoordSender` has been swapped out. This wasn't an issue before as we never really created multiple leaf transactions in parallel. This recently change in 0f4b431, which started parallelizing FK and uniqueness checks. With this change, we could see FK or uniqueness violations when in fact the transaction needed to be retried. This patch fixes the issue described above by not handling the retry error when creating leaf transactions. Instead, we expect the ConnExecutor to retry the entire transaction and prepare it for another iteration. Fixes #97141 Epic: none Release note: None 98732: cloud/gcp_test: add weird code 0/ok error to regex r=dt a=dt Still unsure why we sometimes see this instead of the other more infromative errors but in the meanime, make the test pass. Release note: none. Epic: none. Co-authored-by: Oleg Afanasyev <[email protected]> Co-authored-by: zachlite <[email protected]> Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: David Taylor <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-observability
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Currently in the
SpanStatsConsumer
, the boundaries sent to the collector are taken from the system tenant's ranges from[Min, Max)
. This is implemented here.More than 10,000 boundaries should be pre-aggregated so the cost of propagating and storing them stays negligible.
A possible implementation could be as follows:
As an example, If there's 100,000 ranges, every 10 ranges would be combined into one.
So the boundaries communicated to the collector would be
[a, j), [k, t), ... etc
Jira issue: CRDB-24304
The text was updated successfully, but these errors were encountered: