docs/rfcs: split up jobs system tables #82638

dt · 2022-06-08T22:19:50Z

Release note: none.

Epic: none.

cockroach-teamcity · 2022-06-08T22:20:00Z

This change is

dt · 2022-06-14T14:47:53Z

docs/RFCS/20220608_split_jobs_system_tables.md

+    	# claim/backoff fields determine who should/when to try execute this job
+    	claim_session_id BYTES,
+    claim_instance_id INT,
+    	backoff_num_attempts INT, 


This column might not be required: I believe ideally we'd let the job itself manage its own retries and any backoff between them it thinks it should add, storing that state in its own info rows, so the backoff at the jobs system level would just be if the node running it crashed before the invoked job-specific function returned, then we should wait some interval before trying again to avoid a job of death crashing all nodes in quick succession.

msbutler · 2022-06-15T16:57:17Z

docs/RFCS/20220608_split_jobs_system_tables.md

+
+#### SHOW JOBS
+
+The job control table, along with these tables plus the intake table, defines the set of jobs the system is running, will run or has has failed or creased running, along with human-readable information about them, such as their description, percentage complete, or  recent status messages; taken together, these what an operator wants to know when they ask "what jobs is my cluster running? How complete are the job I was running? Did they fail and if so why?"  This interaction is fulfilled by SHOW JOBS and the jobs page on the DB console. Both of these are powered by an internal virtual table crdb_internal.jobs, and will continue to be going forward, however this table will now be reading columns only from the control table and the status tables (plus the intake table). Rendering the virtual table will _not_ require decoding large proto payloads, that also contain large amounts of jon-specific state, nor will it block on transactions that are manipulating that state, as the virtual table will not depend on the info table, described below, where that state now resides and is manipulated.


In the jobs_control table description above, you say, "the [job control table] is exclusively used by the jobs system itself; no user-controlled or job-controlled transactions are allowed in this table". But here, you're saying the SHOW JOBS requires reading from the control table. Are these contradictory statements, or am I confused?

clarified in meeting: " user-controlled or job-controlled transactions" implies an explicit transaction. Implicit txns on the control table from SHOW JOBS, are still cool.

Open question: can SHOW JOBS get wrapped in a user's explicit txn.

SHOW JOBS, PAUSE JOB, etc are all within the scope of the "jobs system" in my book, as compared to, say, a backup job's individual planning or checkpoint code.

SHOW JOBS can be, but it's a read, so I think that's fine

docs/RFCS/20220608_split_jobs_system_tables.md

msbutler · 2022-06-15T17:14:01Z

docs/RFCS/20220608_split_jobs_system_tables.md

+    )
+    CREATE TABLE system.job_status_message (
+    	job_id INT NOT NULL,
+    last_updated TIMESTAMPTZ,


per meeting explanation. it could be worth spelling out why the job_status_message table is separated from control_table: this is an attempt to prevent a single row from getting too large (i.e. filling up a single range) and to allow for history of statuses over time.

msbutler · 2022-06-15T18:05:46Z

docs/RFCS/20220608_split_jobs_system_tables.md

+
+#### SHOW JOBS
+
+The job control table, along with these tables plus the intake table, defines the set of jobs the system is running, will run or has has failed or creased running, along with human-readable information about them, such as their description, percentage complete, or  recent status messages; taken together, these what an operator wants to know when they ask "what jobs is my cluster running? How complete are the job I was running? Did they fail and if so why?"  This interaction is fulfilled by SHOW JOBS and the jobs page on the DB console. Both of these are powered by an internal virtual table crdb_internal.jobs, and will continue to be going forward, however this table will now be reading columns only from the control table and the status tables (plus the intake table). Rendering the virtual table will _not_ require decoding large proto payloads, that also contain large amounts of jon-specific state, nor will it block on transactions that are manipulating that state, as the virtual table will not depend on the info table, described below, where that state now resides and is manipulated.


clarified in meeting: " user-controlled or job-controlled transactions" implies an explicit transaction. Implicit txns on the control table from SHOW JOBS, are still cool.

Open question: can SHOW JOBS get wrapped in a user's explicit txn.

docs/RFCS/20220608_split_jobs_system_tables.md

petermattis

I'm not intimately familiar with all of the problems with the singular system.jobs table, but found the rationale behind the separation into multiple tables understandable. I'll leave it to others to give official sign-off on this RFC.

docs/RFCS/20220608_split_jobs_system_tables.md

chengxiong-ruan · 2022-07-01T16:00:16Z

docs/RFCS/20220608_split_jobs_system_tables.md

+```
+    CREATE TABLE system.job_info (
+    	job_id INT NOT NULL,
+    	info_id STRING NOT NULL,


info_id -> info_name / info_description?

docs/RFCS/20220608_split_jobs_system_tables.md

nvanbenschoten

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @petermattis, and @stevendanna)

docs/RFCS/20220608_split_jobs_system_tables.md line 11 at r2 (raw file):

The background job execution framework within CockroachDB currently stores its state and the state
of the jobs it runs in columns of the `system.jobs` table.

What is the effect of this RFC on the scheduled jobs infrastructure? It's not mentioned, so it doesn't sound like we're proposing any changes to the scheduled_jobs table itself. Do the benefits listed here translate to scheduled jobs execution? I imagine some do. On the other hand, scheduled_jobs.next_run is still mutable, so this may not solve all of the contention issues we've seen during scheduled jobs execution.

docs/RFCS/20220608_split_jobs_system_tables.md line 73 at r2 (raw file):

The current system.jobs table will be split into five separate tables: 

1. `system.job_control` tracks the set of jobs, what step they need to take and coordinates which

Is now the time to consider introducing schemas into the system database to organize related tables under a set of high-level functions?

docs/RFCS/20220608_split_jobs_system_tables.md line 81 at r2 (raw file):

4. `system.job_intake` is used to sidestep creation transaction contention in the control table.

Each of these four tables serves a specific purpose and is described in more detail below, but two

nit: "but the two"

docs/RFCS/20220608_split_jobs_system_tables.md line 229 at r2 (raw file):

system is running, will run or has has failed or creased running, along with human-readable
information about them, such as their description, percentage complete, or  recent status messages;
taken together, these what an operator wants to know when they ask "what jobs is my cluster running?

nit: "these are what"

docs/RFCS/20220608_split_jobs_system_tables.md line 235 at r2 (raw file):

reading columns only from the control table and the status tables (plus the intake table). Rendering
the virtual table will _not_ require decoding large proto payloads, that also contain large amounts
of jon-specific state, nor will it block on transactions that are manipulating that state, as the

nit: job-specific

docs/RFCS/20220608_split_jobs_system_tables.md line 345 at r2 (raw file):

giant, and such cases should degrade gracefully rather than make the jobs tables unavailable.

Thus, like the status tables above, updates to keys in this info table will instead be an insert of

Implementing MVCC over SQL solely for the purpose of allowing range splits between versions of the row feels unfortunate. That said, you make a strong argument for doing so, especially if this project lands while the MVCC GC TTL on this table is still 25h. How does this project sequence with the line of work to drop the default GC TTL? If we don't want one to depend on the other, could we use a lower default GC TTL specifically on this table, like we do for system.public.replication_stats?

docs/RFCS/20220608_split_jobs_system_tables.md line 419 at r2 (raw file):

the only rows skipped by such a SKIP LOCKED query are only new rows

And rows in the process of being consumed from the queue.

docs/RFCS/20220608_split_jobs_system_tables.md line 427 at r2 (raw file):

CREATE TABLE system.job_intake (

What are the ordering semantics of this queue? If two jobs are created for the same object, is there any guarantee about which one gets turned into a job first? Are we using mutual exclusion on some other table (e.g. system.descriptors) to prevent incompatible jobs from being queued?

docs/RFCS/20220608_split_jobs_system_tables.md line 427 at r2 (raw file):

CREATE TABLE system.job_intake (

Is there observability into the state of this queue? Or some other way to prevent its asynchrony from confusing users? For instance, if a user creates a backup that hasn't made its way out of the jobs_intake table, what will they see when they run SHOW JOBS?

docs/RFCS/20220608_split_jobs_system_tables.md line 428 at r2 (raw file):

CREATE TABLE system.job_intake (
id INT PRIMARY KEY,

Is this a sequence? How do we generate new IDs?

docs/RFCS/20220608_split_jobs_system_tables.md line 450 at r2 (raw file):

WHERE id IN (
    SELECT * FROM system.job_intake
    FOR UPDATE SKIP LOCKED LIMIT 100;

Without any ordering+predicate here, this is a full table scan over all MVCC garbage in the queue. Are we ok with that because we expect this queue to be very low throughput?

docs/RFCS/20220608_split_jobs_system_tables.md line 450 at r2 (raw file):

LIMIT 100

nit: this batch size seems large for a low-throughput queue. My intuition is that batches this large will cause more harm than good, which is what we have seen with scheduled jobs execution. Consider dropping to something like 16.

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)

docs/RFCS/20220608_split_jobs_system_tables.md line 345 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Implementing MVCC over SQL solely for the purpose of allowing range splits between versions of the row feels unfortunate. That said, you make a strong argument for doing so, especially if this project lands while the MVCC GC TTL on this table is still 25h. How does this project sequence with the line of work to drop the default GC TTL? If we don't want one to depend on the other, could we use a lower default GC TTL specifically on this table, like we do for system.public.replication_stats?

Lowering the TTL on this table won't save us since it needs to be backed up, so even if the default TTL is 5min, if the backup cadence is hourly, the protected timestamp will keep it for an hour until that backup runs.

If someone has an 8mb proto they're stuffing into an info record and revise it every 30s, then an hourly backup cadence means we have at least 120 revisions, so almost 2x the range size. Of course it'd be better to not make a single, giant 8mb proto, but I sorta want to be pessimistic here and assume that'll happen one way or another and just defend against it.

dt

Scheduling node: only the job_info table is targeting inclusion in 22.2, where it'll be available for opt-in use by the code of existing jobs. The new control, status/progress and intake tables and migrating the job system to them is slated for prototyping (via a standalone simulator app) during the stability period to then be ready for implementation in 23.1.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)

docs/RFCS/20220608_split_jobs_system_tables.md line 179 at r1 (raw file):

Previously, msbutler (Michael Butler) wrote…

nit: info_type is specified in the index

removed

docs/RFCS/20220608_split_jobs_system_tables.md line 11 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What is the effect of this RFC on the scheduled jobs infrastructure? It's not mentioned, so it doesn't sound like we're proposing any changes to the scheduled_jobs table itself. Do the benefits listed here translate to scheduled jobs execution? I imagine some do. On the other hand, scheduled_jobs.next_run is still mutable, so this may not solve all of the contention issues we've seen during scheduled jobs execution.

No touching schedules directly, no. Given the layering boundaries we can just lump schedules in with all other creators of jobs, so the changes to have creator txns not cause job system contention should benefit whether those jobs are created by schedules or humans, but nothing unique to schedules being proposed here.

docs/RFCS/20220608_split_jobs_system_tables.md line 73 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is now the time to consider introducing schemas into the system database to organize related tables under a set of high-level functions?

Eh, I don't know if it really matters; system tables are for use by code in the db, not browsing by humans, so I don't really know if it is worth the added complexity to mess with that just for the extra level of labeling hierarchy?

docs/RFCS/20220608_split_jobs_system_tables.md line 321 at r2 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

info_id -> info_name / info_description?

how about info_key?

docs/RFCS/20220608_split_jobs_system_tables.md line 428 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this a sequence? How do we generate new IDs?

Callers creating jobs pick them, same as they do job ids today (generally using the unique_rowid logic)

…92617 #92624 #92631 #92643 #92666 #92672 #92678 84866: jobs: introduce system.job_info table and helpers r=dt a=dt See commits. Design: #82638 Epic: None. 91554: cdc: add elastic CPU control to CDC event processing r=jayshrivastava a=jayshrivastava ### admission: make Pacer type available in SQL server config Currently, the Pacer type is only used within KV, but will be used by SQL in future changes. For example, code for encoding/decoding CDC events resides in distSQL and is CPU intensive, so there is a plan to integrate admission control to it in (#90089). This change makes the Pacer type available to the SQL layer via the `execinfra.ServerConfig`. Because the Pacer was previously only used by KV, it lived in the `kvadmission` package. Since this change makes it available outside of KV, it is moved to the `admission` package. Furthermore, this change adds a new method, `ElasticCPUGrantCoordinator.NewPacer`, to instantiate new Pacer structs. Since the `ElasticCPUGrantCoordinator` implements several features not relevant to SQL, this change passes the coordinator to the SQL server config as the interface `PacerMaker`, which makes only the `NewPacer` method accessible. Currently tenant servers do not create grant coordinators for admission control. This change retains that behavior, except it passes a `nil` `ElasticCPUGrandCoordinator` which creates `nil`/noop Pacers. Adding these coordinators to tenant servers is a change outside the scope of this commit and is left as a `TODO`. Release note: None ### cdc: add elastic CPU control to CDC event processing Previously, the CPU-bound work of CDC event processing (encoding / decoding rows) had the potential to consume a lot of CPU and disrupt foreground SQL traffic. This changes adds elastic CPU control to event processing so that it does not use excessive CPU and starve foreground traffic. This change also adds two new, non-public cluster settings, which control enabling/disabling CPU control for CDC event processing and controlling the requested grant size measured in CPU time. Fixes: #90089 Release note: None ### roachtest: add initial scan only case to elastic cdc Previously, this roachtest would not test changefeeds running with `initial_scan_only`. This option tends to have a significant impact on foreground latency due to high CPU usage, thus it should be included in this test which measures CPU usage and foreground latency while changefeeds are running. Release note: None 92293: sql: add TenantStatusServer.NodeLocality to support crdb_internal.ranges{_no_leases} for secondary tenants r=knz a=ecwall This PR adds a new node locality lookup KV admin function which is used by crdb_internal.ranges{_no_leases} and SHOW RANGES to work for secondary tenants. Release note: None Epic: CRDB-14522 92348: ci: add script to open pr to update bazel builder version r=rail a=healthy-pod Release note: None Part of: CRDB-11061 92470: admission: export elastic CPU utilization % as a metric, natively r=irfansharif a=irfansharif Fixes #89814. With the elastic CPU limiter (#86638) closely tracking acquired/returned CPU nanoseconds, it's possible to compute what % CPU utilization it's nominally overseeing. In experimentation we've been using prometheus where this CPU % is being computed using: ( rate(admission_elastic_cpu_acquired_nanos[$__rate_interval]) - rate(admission_elastic_cpu_acquired_nanos[$__rate_interval]) ) / admission_elastic_cpu_max_available_nanos This timeseries math is not possible in CRDB natively, but it's a useful one to have to observe the CPU limiter in action. This PR computes this number within CRDB and exports it as a metric. Below we see the two different forms of this metric, one computed as described above (smoother) and the version introduced in this PR. <img width="800" alt="image" src="https://user-images.githubusercontent.com/10536690/203867144-465e7373-8e40-4090-9772-32109eb70c7c.png"> Release note: None 92572: sqlstats: add rows_written to node_statement_statistics r=matthewtodd a=matthewtodd Fixes #91042 (No release note here since `node_statement_statistics` is not one of the ["Use in production" `crdb_internal` tables](https://www.cockroachlabs.com/docs/stable/crdb-internal.html#tables).) Release note: None 92582: kvcoord: Correctly handle stuck rangefeeds r=miretskiy a=miretskiy Fixes #92570 A watcher responsible for restarting stuck range feeds may incorrectly cancel rangefeed if the downstream event consumer is slow. Release note (bug fix): Fix incorrect cancellation logic when attempting to detect stuck range feeds. 92602: server: include time units in app protos comments r=xinhaoz a=xinhaoz Closes #89976 This commit includes the unit of time in the comments of fields in app_stats protos where the unit of time was not specified. Release note: None 92612: ui: add link on txn insight details fingerprint r=maryliag a=maryliag Previously, the fingerprint id showing on the contention table inside the transaction insights details didn't have a link to the fingerprint details page. This commit adds the link, including the proper setTimeScale to use the start time of the selected transaction. Fixes #91291 https://www.loom.com/share/53055cfc6b494e1bb7d11bba54252b22 Release note (ui change): Add link on fingerprint ID on high contention table inside Transaction Insights Details page. 92617: sql: Fix create partial stats test cases r=michae2 a=faizaanmadhani This commit modifies the create partial statistics test cases to ensure that the full table statistics that are to be used to create a partial statistic exist in the cache. Previously, this was not the case so certain tests had non-deterministic outputs causing failures in stress tests. Resolves: #92495 and #92559 Epic: CRDB-19449 Release note: None 92624: server_test: make TestClusterVersionUpgrade error more informative r=renatolabs a=andyyang890 This patch changes the error returned by TestClusterVersionUpgrade when auto upgrade does not successfully complete to provide the current version of the cluster instead of the old version. Epic: None Release note: None 92631: logictest: ensure lower bound on bytes limit for sqlite r=yuzefovich a=yuzefovich This commit makes it so that `defaultBatchBytesLimit` is set to at least 3KiB when running sqlite logic tests since if that value is too low, the tests can take really long time (in one example with value of 163 bytes it took 25 minutes vs 3 minutes with 2.5KiB value). This is achieved by explicitly updating this single metamorphic value when run with the sqlite target. I chose this option rather than forcing production values knob (which would disable some other randomizations) to have the smallest possible reduction in test coverage while still making the tests fast enough. Fixes: #92534. Release note: None 92643: storage: fix a bug with reverse scans, multiple column families, and max keys r=yuzefovich a=yuzefovich This commit fixes a bug with how we're checking whether the last row has final column family when performing a reverse scan. Previously, we'd incorrectly treat the largest column ID as the "last" one, but with the reverse iteration actually the zeroth column family is the "last" one. As a result, we could return an incomplete row to SQL. However, there is no production impact to this bug because no user at the SQL layer currently satisfies all the conditions: - `WholeRowsOfSize` option must be used. Currently, it is only used by the streamer; - the reverse scan must be requested and `MaxSpanRequestKeys` must be set - neither is currently done by the streamer. Epic: None Release note: None 92666: sqlinstance: make Start async, use in-memory copy of self to bootstrap r=ajwerner a=jaylim-crl This solves bullet 4 of #85737. This commit makes sqlinstance.Start async, and not block until the rangefeed gets started. Epic: [CRDB-18596](https://cockroachlabs.atlassian.net/browse/CRDB-18596) Release note: None 92672: spanconfigreporter: fix up the add-leaktest invocation r=rail a=knz This was causing an error in CI. Release note: None 92678: importer: use 1GiB max-sql-memory in a test r=yuzefovich a=yuzefovich Fixes: #92225. Release note: None Co-authored-by: David Taylor <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Evan Wall <[email protected]> Co-authored-by: healthy-pod <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Matthew Todd <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: Faizaan Madhani <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @baoalvin1, @chengxiong-ruan, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)

docs/RFCS/20220608_split_jobs_system_tables.md line 130 at r1 (raw file):

Previously, msbutler (Michael Butler) wrote…

per meeting explanation. it could be worth spelling out why the job_status_message table is separated from control_table: this is an attempt to prevent a single row from getting too large (i.e. filling up a single range) and to allow for history of statuses over time.

Done.

docs/RFCS/20220608_split_jobs_system_tables.md line 427 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What are the ordering semantics of this queue? If two jobs are created for the same object, is there any guarantee about which one gets turned into a job first? Are we using mutual exclusion on some other table (e.g. system.descriptors) to prevent incompatible jobs from being queued?

Yes, external. I've moved this whole intake queue section under a disclaimer that it is speculation and will require its own re-review before it is considered, so we can move forward with the well understood and more urgently needed info table now.

docs/RFCS/20220608_split_jobs_system_tables.md line 427 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is there observability into the state of this queue? Or some other way to prevent its asynchrony from confusing users? For instance, if a user creates a backup that hasn't made its way out of the jobs_intake table, what will they see when they run SHOW JOBS?

I don't think you'd see it in SHOW JOBS until it is in the job table, as SHOW JOBSs should only be jobs that the system actually thinks it is running, i.e. those in the control table.
Maybe a creation statement should poll the queue to see when to return so that creation appears atomic.

We're shelving this whole section to avoid resolving questions like this right now though so we can more forward with the more urgent parts.

docs/RFCS/20220608_split_jobs_system_tables.md line 450 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

LIMIT 100

nit: this batch size seems large for a low-throughput queue. My intuition is that batches this large will cause more harm than good, which is what we have seen with scheduled jobs execution. Consider dropping to something like 16.

done.

docs/RFCS/20220608_split_jobs_system_tables.md line 450 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Without any ordering+predicate here, this is a full table scan over all MVCC garbage in the queue. Are we ok with that because we expect this queue to be very low throughput?

I think that's correct about expectation but it deserves considering critically. But re:above, we're not doing this section now.

knz

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @baoalvin1, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)

docs/RFCS/20220608_split_jobs_system_tables.md line 72 at r1 (raw file):

Previously, dt (David Taylor) wrote…

we should add first run time here to account for intake lag

did we?

docs/RFCS/20220608_split_jobs_system_tables.md line 82 at r1 (raw file):

Previously, dt (David Taylor) wrote…

This column might not be required: I believe ideally we'd let the job itself manage its own retries and any backoff between them it thinks it should add, storing that state in its own info rows, so the backoff at the jobs system level would just be if the node running it crashed before the invoked job-specific function returned, then we should wait some interval before trying again to avoid a job of death crashing all nodes in quick succession.

Could you import the list of columns we ended up implementing into the RFC so it reflects the current situation?

docs/RFCS/20220608_split_jobs_system_tables.md line 139 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Fair. I was hoping to punt on many of the implementation nitty gritty, particularly for the new control stuff, initially; our asap priority is to add the new info table, just as an available opt-in tool for existing jobs to use if they want to, with the overall control overhaul being sketched here just to make sure it'll all fit together when we're done

Maybe it's time to have one or two example queries here. As Aditya's experience has shown, these are not trivial.

docs/RFCS/20220608_split_jobs_system_tables.md line 190 at r1 (raw file):

Previously, dt (David Taylor) wrote…

We should also limit the max size of a single row.

Did we?

docs/RFCS/20220608_split_jobs_system_tables.md line 2 at r3 (raw file):

- Feature Name: Splitting Up the Jobs System Tables
- Status: draft

Change the status.

docs/RFCS/20220608_split_jobs_system_tables.md line 5 at r3 (raw file):

- Start Date: 2022-06-08
- Authors: David Taylor, Steven Danna, Yevgeniy Miretskiy
- RFC PR: (PR # after acceptance of initial draft)

Please update these header fields.

docs/RFCS/20220608_split_jobs_system_tables.md line 76 at r3 (raw file):

   nodes will do so.
2. `system.job_status_progress` and `system.job_status_message` record human-consumed progress and
   status changes

nit: period at the end of this list item and the following one.

docs/RFCS/20220608_split_jobs_system_tables.md line 88 at r3 (raw file):

b) **there may be many rows per job in job_info. **

This change highly severable, allowing it to be implemented gradually: The transition to storing

This change is*

docs/RFCS/20220608_split_jobs_system_tables.md line 132 at r3 (raw file):

CREATE TABLE system.job_control (
	id			INT PRIMARY KEY,

nit: maybe replace tabs by spaces.

docs/RFCS/20220608_split_jobs_system_tables.md line 213 at r3 (raw file):

CREATE TABLE system.job_status_progress (

ditto my previous comment about tabs and importing the most recent schema in the code into the RFC.

docs/RFCS/20220608_split_jobs_system_tables.md line 326 at r3 (raw file):

CREATE TABLE system.job_info (

ditto: update schema and tabs->spaces

docs/RFCS/20220608_split_jobs_system_tables.md line 369 at r3 (raw file):

INSERT INTO system.jobs_info SELECT $1, $3, NOW(), $4 WHERE crdb_internal.sql_liveness_is_alive($2) IS TRUE

Did we end up implemetning this? I think we did the claim below instead. It's worth clarifying in the text.

Release note: none.

dt · 2024-09-17T19:42:51Z

I've slightly adjusted this to reflect the fact we won't be replacing the jobs table but just extending it, and the other tables have new names. I think this still isn't quite is 100% what we expect to implement or indeed already implemented in some places, but I still think it'll be more useful actually in the repo than hanging out on a branch for another two years so I think we should go ahead and merge as is and can amend more later if we're so inclined (though plenty of our RFCs aren't 1:1 with the code by the time it merges anyway).

dt · 2024-09-17T19:43:02Z

TFTR!

bors r+

craig · 2024-09-17T20:22:09Z

Build succeeded:

dt requested review from stevendanna, ajwerner and miretskiy June 8, 2022 22:19

dt requested a review from a team as a code owner June 8, 2022 22:19

dt commented Jun 14, 2022

View reviewed changes

msbutler reviewed Jun 15, 2022

View reviewed changes

dt commented Jun 15, 2022

View reviewed changes

docs/RFCS/20220608_split_jobs_system_tables.md Outdated Show resolved Hide resolved

dt commented Jun 15, 2022

View reviewed changes

docs/RFCS/20220608_split_jobs_system_tables.md Show resolved Hide resolved

msbutler reviewed Jun 15, 2022

View reviewed changes

petermattis reviewed Jun 20, 2022

View reviewed changes

dt force-pushed the jobs-rfc branch from 6daf917 to 1cb388a Compare June 28, 2022 21:29

chengxiong-ruan reviewed Jul 1, 2022

View reviewed changes

docs/RFCS/20220608_split_jobs_system_tables.md Outdated Show resolved Hide resolved

chengxiong-ruan reviewed Jul 1, 2022

View reviewed changes

docs/RFCS/20220608_split_jobs_system_tables.md Show resolved Hide resolved

nvanbenschoten reviewed Jul 5, 2022

View reviewed changes

dt commented Jul 12, 2022

View reviewed changes

dt commented Jul 17, 2022

View reviewed changes

ajwerner mentioned this pull request Jul 25, 2022

jobs: introduce system.job_info table and helpers #84866

Merged

shermanCRL requested a review from baoalvin1 November 7, 2022 18:52

stevendanna added the T-disaster-recovery label Jan 18, 2023

adityamaru mentioned this pull request Feb 27, 2023

jobs: migrate jobs to write their Payload and Progress to the jobs_info table #97762

Closed

dt force-pushed the jobs-rfc branch from 1cb388a to 49ce812 Compare March 17, 2023 16:59

dt commented Mar 17, 2023

View reviewed changes

knz reviewed Mar 29, 2023

View reviewed changes

stevendanna removed request for stevendanna, ajwerner and baoalvin1 October 3, 2023 13:40

stevendanna mentioned this pull request Apr 19, 2024

jobs: crdb_internal.jobs and crdb_internal.system_jobs perform poorly #122687

Open

7 tasks

dt force-pushed the jobs-rfc branch from 49ce812 to 996524f Compare September 17, 2024 19:28

docs/rfcs: split up jobs system tables

db3a235

Release note: none.

dt force-pushed the jobs-rfc branch from 996524f to db3a235 Compare September 17, 2024 19:34

msbutler approved these changes Sep 17, 2024

View reviewed changes

craig bot merged commit 40b4ad8 into cockroachdb:master Sep 17, 2024
22 of 23 checks passed

dt deleted the jobs-rfc branch September 25, 2024 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/rfcs: split up jobs system tables #82638

docs/rfcs: split up jobs system tables #82638

dt commented Jun 8, 2022 •

edited

Loading

cockroach-teamcity commented Jun 8, 2022

dt Jun 14, 2022

msbutler Jun 15, 2022 •

edited

Loading

msbutler Jun 15, 2022

dt Jun 28, 2022

msbutler Jun 15, 2022

msbutler Jun 15, 2022

petermattis left a comment

chengxiong-ruan Jul 1, 2022

nvanbenschoten left a comment

dt left a comment

dt left a comment

dt left a comment

knz left a comment

dt commented Sep 17, 2024

dt commented Sep 17, 2024

craig bot commented Sep 17, 2024


		#### SHOW JOBS

		The job control table, along with these tables plus the intake table, defines the set of jobs the system is running, will run or has has failed or creased running, along with human-readable information about them, such as their description, percentage complete, or recent status messages; taken together, these what an operator wants to know when they ask "what jobs is my cluster running? How complete are the job I was running? Did they fail and if so why?" This interaction is fulfilled by SHOW JOBS and the jobs page on the DB console. Both of these are powered by an internal virtual table crdb_internal.jobs, and will continue to be going forward, however this table will now be reading columns only from the control table and the status tables (plus the intake table). Rendering the virtual table will _not_ require decoding large proto payloads, that also contain large amounts of jon-specific state, nor will it block on transactions that are manipulating that state, as the virtual table will not depend on the info table, described below, where that state now resides and is manipulated.

docs/rfcs: split up jobs system tables #82638

docs/rfcs: split up jobs system tables #82638

Conversation

dt commented Jun 8, 2022 • edited Loading

cockroach-teamcity commented Jun 8, 2022

dt Jun 14, 2022

Choose a reason for hiding this comment

msbutler Jun 15, 2022 • edited Loading

Choose a reason for hiding this comment

msbutler Jun 15, 2022

Choose a reason for hiding this comment

dt Jun 28, 2022

Choose a reason for hiding this comment

msbutler Jun 15, 2022

Choose a reason for hiding this comment

msbutler Jun 15, 2022

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

chengxiong-ruan Jul 1, 2022

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

dt commented Sep 17, 2024

dt commented Sep 17, 2024

craig bot commented Sep 17, 2024

dt commented Jun 8, 2022 •

edited

Loading

msbutler Jun 15, 2022 •

edited

Loading