-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlinstance: optimize for MR cold-start #85737
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
6 tasks
jaylim-crl
added a commit
to jaylim-crl/cockroach
that referenced
this issue
Oct 31, 2022
…ances Related to cockroachdb#85737. Previously, when a SQL pod starts up, we will allocate an instance ID for it synchronously. This can be a problem for multi-region setup because that would now involve a read and write across all regions. This commit is part of the work to make the system.sql_instances table REGIONAL BY ROW to reduce cold start times. Here, we would asynchronously pre-allocate instance IDs, and the startup process will then claim an ID for the SQL pod. Once the sql_instances table is made REGIONAL BY ROW, we can update the logic to pre-allocate for every region, and only perform a regional lookup when claiming an ID. Epic: None Release note (sql change): The `system.sql_instances` table now includes pre-allocated ID entries, where all the fields except `id` will be NULL.
jaylim-crl
added a commit
to jaylim-crl/cockroach
that referenced
this issue
Nov 4, 2022
…ances Related to cockroachdb#85737. Previously, when a SQL pod starts up, we will allocate an instance ID for it synchronously. This can be a problem for multi-region setup because that would now involve a read and write across all regions. This commit is part of the work to make the system.sql_instances table REGIONAL BY ROW to reduce cold start times. Here, we would asynchronously pre-allocate instance IDs, and the startup process will then claim an ID for the SQL pod. Once the sql_instances table is made REGIONAL BY ROW, we can update the logic to pre-allocate for every region, and only perform a regional lookup when claiming an ID. Epic: None Release note (sql change): The `system.sql_instances` table now includes pre-allocated ID entries, where all the fields except `id` will be NULL.
jaylim-crl
added a commit
to jaylim-crl/cockroach
that referenced
this issue
Nov 7, 2022
…ances Related to cockroachdb#85737. Previously, when a SQL pod starts up, we will allocate an instance ID for it synchronously. This can be a problem for multi-region setup because that would now involve a read and write across all regions. This commit is part of the work to make the system.sql_instances table REGIONAL BY ROW to reduce cold start times. Here, we would asynchronously pre-allocate instance IDs, and the startup process will then claim an ID for the SQL pod. Once the sql_instances table is made REGIONAL BY ROW, we can update the logic to pre-allocate for every region, and only perform a regional lookup when claiming an ID. Epic: None Release note (sql change): The `system.sql_instances` table now includes pre-allocated ID entries, where all the fields except `id` will be NULL.
jaylim-crl
added a commit
to jaylim-crl/cockroach
that referenced
this issue
Nov 7, 2022
…ances Related to cockroachdb#85737. Previously, when a SQL pod starts up, we will allocate an instance ID for it synchronously. This can be a problem for multi-region setup because that would now involve a read and write across all regions. This commit is part of the work to make the system.sql_instances table REGIONAL BY ROW to reduce cold start times. Here, we would asynchronously pre-allocate instance IDs, and the startup process will then claim an ID for the SQL pod. Once the sql_instances table is made REGIONAL BY ROW, we can update the logic to pre-allocate for every region, and only perform a regional lookup when claiming an ID. At the same time, we will now use a cached reader of the sqlLivenessProvider with the SQL instance storage. This is fine since the cached reader is on the conservative side, and will return true on IsAlive if it does not know. The only downside to this is that we leave instance rows lying around in the sql_instances table longer, but that is fine since they will eventually be reclaimed once the cache gets updated. Epic: None Release note (sql change): The `system.sql_instances` table now includes pre-allocated ID entries, where all the fields except `id` will be NULL.
craig bot
pushed a commit
that referenced
this issue
Nov 8, 2022
90427: instancestorage: asynchronously pre-allocate instance IDs in sql_instances r=JeffSwenson,ajwerner a=jaylim-crl Related to #85737. Previously, when a SQL pod starts up, we will allocate an instance ID for it synchronously. This can be a problem for multi-region setup because that would now involve a read and write across all regions. This commit is part of the work to make the system.sql_instances table REGIONAL BY ROW to reduce cold start times. Here, we would asynchronously pre-allocate instance IDs, and the startup process will then claim an ID for the SQL pod. Once the sql_instances table is made REGIONAL BY ROW, we can update the logic to pre-allocate for every region, and only perform a regional lookup when claiming an ID. At the same time, we will now use a cached reader of the sqlLivenessProvider with the SQL instance storage. This is fine since the cached reader is on the conservative side, and will return true on IsAlive if it does not know. The only downside to this is that we leave instance rows lying around in the sql_instances table longer, but that is fine since they will eventually be reclaimed once the cache gets updated. Epic: None Release note (sql change): The `system.sql_instances` table now includes pre-allocated ID entries, where all the fields except `id` will be NULL. 91170: dev: add -flex-types argument for sqlite logic tests by default r=yuzefovich a=yuzefovich This commit adds a shortcut for specifying `-flex-types` test argument of the logic tests that is needed to run sqlite targets. It also defaults to passing this argument if only sqlite targets are specified. We do use this flag in the CI, so it reduces a possibility of confusion. Found when working on #58089. Epic: None Release note: None 91388: loqrecovery: prohibit plans with any descriptor changes r=tbg a=aliher1911 Previously, loss of quorum recovery allowed plan creation for cases where descriptor change didn't change survivor replica. This is not sufficient as replica will still be checked against its status and will panic when this entry is applied. This diff changes validation behaviour to treat this change as a problem and require force flag to override it. Release note: None Fixes #91271 Co-authored-by: Jay <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Oleg Afanasyev <[email protected]>
ajwerner
added a commit
to ajwerner/cockroach
that referenced
this issue
Nov 29, 2022
This solves bullet 4 of cockroachdb#85737 Release note: None
jaylim-crl
added a commit
to jaylim-crl/cockroach
that referenced
this issue
Nov 29, 2022
This solves bullet 4 of cockroachdb#85737. This commit makes sqlinstance.Start async, and not block until the rangefeed gets started. Epic: CRDB-18596 Release note: None Co-authored-by: Andrew Werner <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Nov 29, 2022
…92617 #92624 #92631 #92643 #92666 #92672 #92678 84866: jobs: introduce system.job_info table and helpers r=dt a=dt See commits. Design: #82638 Epic: None. 91554: cdc: add elastic CPU control to CDC event processing r=jayshrivastava a=jayshrivastava ### admission: make Pacer type available in SQL server config Currently, the Pacer type is only used within KV, but will be used by SQL in future changes. For example, code for encoding/decoding CDC events resides in distSQL and is CPU intensive, so there is a plan to integrate admission control to it in (#90089). This change makes the Pacer type available to the SQL layer via the `execinfra.ServerConfig`. Because the Pacer was previously only used by KV, it lived in the `kvadmission` package. Since this change makes it available outside of KV, it is moved to the `admission` package. Furthermore, this change adds a new method, `ElasticCPUGrantCoordinator.NewPacer`, to instantiate new Pacer structs. Since the `ElasticCPUGrantCoordinator` implements several features not relevant to SQL, this change passes the coordinator to the SQL server config as the interface `PacerMaker`, which makes only the `NewPacer` method accessible. Currently tenant servers do not create grant coordinators for admission control. This change retains that behavior, except it passes a `nil` `ElasticCPUGrandCoordinator` which creates `nil`/noop Pacers. Adding these coordinators to tenant servers is a change outside the scope of this commit and is left as a `TODO`. Release note: None ### cdc: add elastic CPU control to CDC event processing Previously, the CPU-bound work of CDC event processing (encoding / decoding rows) had the potential to consume a lot of CPU and disrupt foreground SQL traffic. This changes adds elastic CPU control to event processing so that it does not use excessive CPU and starve foreground traffic. This change also adds two new, non-public cluster settings, which control enabling/disabling CPU control for CDC event processing and controlling the requested grant size measured in CPU time. Fixes: #90089 Release note: None ### roachtest: add initial scan only case to elastic cdc Previously, this roachtest would not test changefeeds running with `initial_scan_only`. This option tends to have a significant impact on foreground latency due to high CPU usage, thus it should be included in this test which measures CPU usage and foreground latency while changefeeds are running. Release note: None 92293: sql: add TenantStatusServer.NodeLocality to support crdb_internal.ranges{_no_leases} for secondary tenants r=knz a=ecwall This PR adds a new node locality lookup KV admin function which is used by crdb_internal.ranges{_no_leases} and SHOW RANGES to work for secondary tenants. Release note: None Epic: CRDB-14522 92348: ci: add script to open pr to update bazel builder version r=rail a=healthy-pod Release note: None Part of: CRDB-11061 92470: admission: export elastic CPU utilization % as a metric, natively r=irfansharif a=irfansharif Fixes #89814. With the elastic CPU limiter (#86638) closely tracking acquired/returned CPU nanoseconds, it's possible to compute what % CPU utilization it's nominally overseeing. In experimentation we've been using prometheus where this CPU % is being computed using: ( rate(admission_elastic_cpu_acquired_nanos[$__rate_interval]) - rate(admission_elastic_cpu_acquired_nanos[$__rate_interval]) ) / admission_elastic_cpu_max_available_nanos This timeseries math is not possible in CRDB natively, but it's a useful one to have to observe the CPU limiter in action. This PR computes this number within CRDB and exports it as a metric. Below we see the two different forms of this metric, one computed as described above (smoother) and the version introduced in this PR. <img width="800" alt="image" src="https://user-images.githubusercontent.com/10536690/203867144-465e7373-8e40-4090-9772-32109eb70c7c.png"> Release note: None 92572: sqlstats: add rows_written to node_statement_statistics r=matthewtodd a=matthewtodd Fixes #91042 (No release note here since `node_statement_statistics` is not one of the ["Use in production" `crdb_internal` tables](https://www.cockroachlabs.com/docs/stable/crdb-internal.html#tables).) Release note: None 92582: kvcoord: Correctly handle stuck rangefeeds r=miretskiy a=miretskiy Fixes #92570 A watcher responsible for restarting stuck range feeds may incorrectly cancel rangefeed if the downstream event consumer is slow. Release note (bug fix): Fix incorrect cancellation logic when attempting to detect stuck range feeds. 92602: server: include time units in app protos comments r=xinhaoz a=xinhaoz Closes #89976 This commit includes the unit of time in the comments of fields in app_stats protos where the unit of time was not specified. Release note: None 92612: ui: add link on txn insight details fingerprint r=maryliag a=maryliag Previously, the fingerprint id showing on the contention table inside the transaction insights details didn't have a link to the fingerprint details page. This commit adds the link, including the proper setTimeScale to use the start time of the selected transaction. Fixes #91291 https://www.loom.com/share/53055cfc6b494e1bb7d11bba54252b22 Release note (ui change): Add link on fingerprint ID on high contention table inside Transaction Insights Details page. 92617: sql: Fix create partial stats test cases r=michae2 a=faizaanmadhani This commit modifies the create partial statistics test cases to ensure that the full table statistics that are to be used to create a partial statistic exist in the cache. Previously, this was not the case so certain tests had non-deterministic outputs causing failures in stress tests. Resolves: #92495 and #92559 Epic: CRDB-19449 Release note: None 92624: server_test: make TestClusterVersionUpgrade error more informative r=renatolabs a=andyyang890 This patch changes the error returned by TestClusterVersionUpgrade when auto upgrade does not successfully complete to provide the current version of the cluster instead of the old version. Epic: None Release note: None 92631: logictest: ensure lower bound on bytes limit for sqlite r=yuzefovich a=yuzefovich This commit makes it so that `defaultBatchBytesLimit` is set to at least 3KiB when running sqlite logic tests since if that value is too low, the tests can take really long time (in one example with value of 163 bytes it took 25 minutes vs 3 minutes with 2.5KiB value). This is achieved by explicitly updating this single metamorphic value when run with the sqlite target. I chose this option rather than forcing production values knob (which would disable some other randomizations) to have the smallest possible reduction in test coverage while still making the tests fast enough. Fixes: #92534. Release note: None 92643: storage: fix a bug with reverse scans, multiple column families, and max keys r=yuzefovich a=yuzefovich This commit fixes a bug with how we're checking whether the last row has final column family when performing a reverse scan. Previously, we'd incorrectly treat the largest column ID as the "last" one, but with the reverse iteration actually the zeroth column family is the "last" one. As a result, we could return an incomplete row to SQL. However, there is no production impact to this bug because no user at the SQL layer currently satisfies all the conditions: - `WholeRowsOfSize` option must be used. Currently, it is only used by the streamer; - the reverse scan must be requested and `MaxSpanRequestKeys` must be set - neither is currently done by the streamer. Epic: None Release note: None 92666: sqlinstance: make Start async, use in-memory copy of self to bootstrap r=ajwerner a=jaylim-crl This solves bullet 4 of #85737. This commit makes sqlinstance.Start async, and not block until the rangefeed gets started. Epic: [CRDB-18596](https://cockroachlabs.atlassian.net/browse/CRDB-18596) Release note: None 92672: spanconfigreporter: fix up the add-leaktest invocation r=rail a=knz This was causing an error in CI. Release note: None 92678: importer: use 1GiB max-sql-memory in a test r=yuzefovich a=yuzefovich Fixes: #92225. Release note: None Co-authored-by: David Taylor <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Evan Wall <[email protected]> Co-authored-by: healthy-pod <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Matthew Todd <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: Faizaan Madhani <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Is your feature request related to a problem? Please describe.
The
sqlinstance
subsystem deals with allocating SQL instance IDs to sql servers and disseminating information about locality and network addresses of live instances among the sql servers. The existing startup process requires a number of synchronous WAN round-trips. The protocol proceeds as follows:sqlliveness
sessionDescribe the solution you'd like
We'll need to solve each of these steps somewhat distinctly. This issue tracks the work required to make 2 efficient and 4 asynchronous.
We need to makes the data scanned and the writes involved involved in the process the claiming of an instance only need to interact with data local to the current region. To do this, we propose partitioning the
system.sql_instances
table as aREGIONAL BY ROW
table.We need to make sure that the just by scanning the local region's partition of the table we can claim an ID. To do this, we'll have some background process pre-populate each region with some unclaimed IDs.
We need to eliminate the code whereby today we check if any claimed entries are expired. If we just eliminate this, then IDs will never be reclaimed. Given we proposed a background process above, we can delegate the work to cleaning up expired claims to that same background task.
In order to make process 4 asynchronous, we need to augment the implementation of the
sqlinstance.AddressResolver
to know about the current process'sInstance
as soon as it has been claimed, and then make the rest of the addresses available only asynchronously.Additional context
This will rely on being able to partition system database tables using MR syntax. This is a part of #85612.
Epic: CRDB-18596
Jira issue: CRDB-18411
The text was updated successfully, but these errors were encountered: