-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/rfcs: split up jobs system tables #82638
Conversation
# claim/backoff fields determine who should/when to try execute this job | ||
claim_session_id BYTES, | ||
claim_instance_id INT, | ||
backoff_num_attempts INT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This column might not be required: I believe ideally we'd let the job itself manage its own retries and any backoff between them it thinks it should add, storing that state in its own info rows, so the backoff at the jobs system level would just be if the node running it crashed before the invoked job-specific function returned, then we should wait some interval before trying again to avoid a job of death crashing all nodes in quick succession.
|
||
#### SHOW JOBS | ||
|
||
The job control table, along with these tables plus the intake table, defines the set of jobs the system is running, will run or has has failed or creased running, along with human-readable information about them, such as their description, percentage complete, or recent status messages; taken together, these what an operator wants to know when they ask "what jobs is my cluster running? How complete are the job I was running? Did they fail and if so why?" This interaction is fulfilled by SHOW JOBS and the jobs page on the DB console. Both of these are powered by an internal virtual table crdb_internal.jobs, and will continue to be going forward, however this table will now be reading columns only from the control table and the status tables (plus the intake table). Rendering the virtual table will _not_ require decoding large proto payloads, that also contain large amounts of jon-specific state, nor will it block on transactions that are manipulating that state, as the virtual table will not depend on the info table, described below, where that state now resides and is manipulated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the jobs_control table description above, you say, "the [job control table] is exclusively used by the jobs system itself; no user-controlled or job-controlled transactions are allowed in this table". But here, you're saying the SHOW JOBS requires reading from the control table. Are these contradictory statements, or am I confused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified in meeting: " user-controlled or job-controlled transactions" implies an explicit transaction. Implicit txns on the control table from SHOW JOBS, are still cool.
Open question: can SHOW JOBS get wrapped in a user's explicit txn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHOW JOBS
, PAUSE JOB
, etc are all within the scope of the "jobs system" in my book, as compared to, say, a backup job's individual planning or checkpoint code.
SHOW JOBS can be, but it's a read, so I think that's fine
) | ||
CREATE TABLE system.job_status_message ( | ||
job_id INT NOT NULL, | ||
last_updated TIMESTAMPTZ, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per meeting explanation. it could be worth spelling out why the job_status_message table is separated from control_table: this is an attempt to prevent a single row from getting too large (i.e. filling up a single range) and to allow for history of statuses over time.
|
||
#### SHOW JOBS | ||
|
||
The job control table, along with these tables plus the intake table, defines the set of jobs the system is running, will run or has has failed or creased running, along with human-readable information about them, such as their description, percentage complete, or recent status messages; taken together, these what an operator wants to know when they ask "what jobs is my cluster running? How complete are the job I was running? Did they fail and if so why?" This interaction is fulfilled by SHOW JOBS and the jobs page on the DB console. Both of these are powered by an internal virtual table crdb_internal.jobs, and will continue to be going forward, however this table will now be reading columns only from the control table and the status tables (plus the intake table). Rendering the virtual table will _not_ require decoding large proto payloads, that also contain large amounts of jon-specific state, nor will it block on transactions that are manipulating that state, as the virtual table will not depend on the info table, described below, where that state now resides and is manipulated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified in meeting: " user-controlled or job-controlled transactions" implies an explicit transaction. Implicit txns on the control table from SHOW JOBS, are still cool.
Open question: can SHOW JOBS get wrapped in a user's explicit txn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not intimately familiar with all of the problems with the singular system.jobs
table, but found the rationale behind the separation into multiple tables understandable. I'll leave it to others to give official sign-off on this RFC.
``` | ||
CREATE TABLE system.job_info ( | ||
job_id INT NOT NULL, | ||
info_id STRING NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info_id
-> info_name
/ info_description
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @petermattis, and @stevendanna)
docs/RFCS/20220608_split_jobs_system_tables.md
line 11 at r2 (raw file):
The background job execution framework within CockroachDB currently stores its state and the state of the jobs it runs in columns of the `system.jobs` table.
What is the effect of this RFC on the scheduled jobs infrastructure? It's not mentioned, so it doesn't sound like we're proposing any changes to the scheduled_jobs table itself. Do the benefits listed here translate to scheduled jobs execution? I imagine some do. On the other hand, scheduled_jobs.next_run
is still mutable, so this may not solve all of the contention issues we've seen during scheduled jobs execution.
docs/RFCS/20220608_split_jobs_system_tables.md
line 73 at r2 (raw file):
The current system.jobs table will be split into five separate tables: 1. `system.job_control` tracks the set of jobs, what step they need to take and coordinates which
Is now the time to consider introducing schemas into the system
database to organize related tables under a set of high-level functions?
docs/RFCS/20220608_split_jobs_system_tables.md
line 81 at r2 (raw file):
4. `system.job_intake` is used to sidestep creation transaction contention in the control table. Each of these four tables serves a specific purpose and is described in more detail below, but two
nit: "but the two"
docs/RFCS/20220608_split_jobs_system_tables.md
line 229 at r2 (raw file):
system is running, will run or has has failed or creased running, along with human-readable information about them, such as their description, percentage complete, or recent status messages; taken together, these what an operator wants to know when they ask "what jobs is my cluster running?
nit: "these are what"
docs/RFCS/20220608_split_jobs_system_tables.md
line 235 at r2 (raw file):
reading columns only from the control table and the status tables (plus the intake table). Rendering the virtual table will _not_ require decoding large proto payloads, that also contain large amounts of jon-specific state, nor will it block on transactions that are manipulating that state, as the
nit: job-specific
docs/RFCS/20220608_split_jobs_system_tables.md
line 345 at r2 (raw file):
giant, and such cases should degrade gracefully rather than make the jobs tables unavailable. Thus, like the status tables above, updates to keys in this info table will instead be an insert of
Implementing MVCC over SQL solely for the purpose of allowing range splits between versions of the row feels unfortunate. That said, you make a strong argument for doing so, especially if this project lands while the MVCC GC TTL on this table is still 25h. How does this project sequence with the line of work to drop the default GC TTL? If we don't want one to depend on the other, could we use a lower default GC TTL specifically on this table, like we do for system.public.replication_stats
?
docs/RFCS/20220608_split_jobs_system_tables.md
line 419 at r2 (raw file):
the only rows skipped by such a SKIP LOCKED query are only new rows
And rows in the process of being consumed from the queue.
docs/RFCS/20220608_split_jobs_system_tables.md
line 427 at r2 (raw file):
CREATE TABLE system.job_intake (
What are the ordering semantics of this queue? If two jobs are created for the same object, is there any guarantee about which one gets turned into a job first? Are we using mutual exclusion on some other table (e.g. system.descriptors
) to prevent incompatible jobs from being queued?
docs/RFCS/20220608_split_jobs_system_tables.md
line 427 at r2 (raw file):
CREATE TABLE system.job_intake (
Is there observability into the state of this queue? Or some other way to prevent its asynchrony from confusing users? For instance, if a user creates a backup that hasn't made its way out of the jobs_intake table, what will they see when they run SHOW JOBS
?
docs/RFCS/20220608_split_jobs_system_tables.md
line 428 at r2 (raw file):
CREATE TABLE system.job_intake (
id INT PRIMARY KEY,
Is this a sequence? How do we generate new IDs?
docs/RFCS/20220608_split_jobs_system_tables.md
line 450 at r2 (raw file):
WHERE id IN ( SELECT * FROM system.job_intake FOR UPDATE SKIP LOCKED LIMIT 100;
Without any ordering+predicate here, this is a full table scan over all MVCC garbage in the queue. Are we ok with that because we expect this queue to be very low throughput?
docs/RFCS/20220608_split_jobs_system_tables.md
line 450 at r2 (raw file):
LIMIT 100
nit: this batch size seems large for a low-throughput queue. My intuition is that batches this large will cause more harm than good, which is what we have seen with scheduled jobs execution. Consider dropping to something like 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)
docs/RFCS/20220608_split_jobs_system_tables.md
line 345 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Implementing MVCC over SQL solely for the purpose of allowing range splits between versions of the row feels unfortunate. That said, you make a strong argument for doing so, especially if this project lands while the MVCC GC TTL on this table is still 25h. How does this project sequence with the line of work to drop the default GC TTL? If we don't want one to depend on the other, could we use a lower default GC TTL specifically on this table, like we do for
system.public.replication_stats
?
Lowering the TTL on this table won't save us since it needs to be backed up, so even if the default TTL is 5min, if the backup cadence is hourly, the protected timestamp will keep it for an hour until that backup runs.
If someone has an 8mb proto they're stuffing into an info record and revise it every 30s, then an hourly backup cadence means we have at least 120 revisions, so almost 2x the range size. Of course it'd be better to not make a single, giant 8mb proto, but I sorta want to be pessimistic here and assume that'll happen one way or another and just defend against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheduling node: only the job_info
table is targeting inclusion in 22.2, where it'll be available for opt-in use by the code of existing jobs. The new control, status/progress and intake tables and migrating the job system to them is slated for prototyping (via a standalone simulator app) during the stability period to then be ready for implementation in 23.1.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)
docs/RFCS/20220608_split_jobs_system_tables.md
line 179 at r1 (raw file):
Previously, msbutler (Michael Butler) wrote…
nit: info_type is specified in the index
removed
docs/RFCS/20220608_split_jobs_system_tables.md
line 11 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What is the effect of this RFC on the scheduled jobs infrastructure? It's not mentioned, so it doesn't sound like we're proposing any changes to the scheduled_jobs table itself. Do the benefits listed here translate to scheduled jobs execution? I imagine some do. On the other hand,
scheduled_jobs.next_run
is still mutable, so this may not solve all of the contention issues we've seen during scheduled jobs execution.
No touching schedules directly, no. Given the layering boundaries we can just lump schedules in with all other creators of jobs, so the changes to have creator txns not cause job system contention should benefit whether those jobs are created by schedules or humans, but nothing unique to schedules being proposed here.
docs/RFCS/20220608_split_jobs_system_tables.md
line 73 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is now the time to consider introducing schemas into the
system
database to organize related tables under a set of high-level functions?
Eh, I don't know if it really matters; system tables are for use by code in the db, not browsing by humans, so I don't really know if it is worth the added complexity to mess with that just for the extra level of labeling hierarchy?
docs/RFCS/20220608_split_jobs_system_tables.md
line 321 at r2 (raw file):
Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
info_id
->info_name
/info_description
?
how about info_key
?
docs/RFCS/20220608_split_jobs_system_tables.md
line 428 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this a sequence? How do we generate new IDs?
Callers creating jobs pick them, same as they do job ids today (generally using the unique_rowid logic)
…92617 #92624 #92631 #92643 #92666 #92672 #92678 84866: jobs: introduce system.job_info table and helpers r=dt a=dt See commits. Design: #82638 Epic: None. 91554: cdc: add elastic CPU control to CDC event processing r=jayshrivastava a=jayshrivastava ### admission: make Pacer type available in SQL server config Currently, the Pacer type is only used within KV, but will be used by SQL in future changes. For example, code for encoding/decoding CDC events resides in distSQL and is CPU intensive, so there is a plan to integrate admission control to it in (#90089). This change makes the Pacer type available to the SQL layer via the `execinfra.ServerConfig`. Because the Pacer was previously only used by KV, it lived in the `kvadmission` package. Since this change makes it available outside of KV, it is moved to the `admission` package. Furthermore, this change adds a new method, `ElasticCPUGrantCoordinator.NewPacer`, to instantiate new Pacer structs. Since the `ElasticCPUGrantCoordinator` implements several features not relevant to SQL, this change passes the coordinator to the SQL server config as the interface `PacerMaker`, which makes only the `NewPacer` method accessible. Currently tenant servers do not create grant coordinators for admission control. This change retains that behavior, except it passes a `nil` `ElasticCPUGrandCoordinator` which creates `nil`/noop Pacers. Adding these coordinators to tenant servers is a change outside the scope of this commit and is left as a `TODO`. Release note: None ### cdc: add elastic CPU control to CDC event processing Previously, the CPU-bound work of CDC event processing (encoding / decoding rows) had the potential to consume a lot of CPU and disrupt foreground SQL traffic. This changes adds elastic CPU control to event processing so that it does not use excessive CPU and starve foreground traffic. This change also adds two new, non-public cluster settings, which control enabling/disabling CPU control for CDC event processing and controlling the requested grant size measured in CPU time. Fixes: #90089 Release note: None ### roachtest: add initial scan only case to elastic cdc Previously, this roachtest would not test changefeeds running with `initial_scan_only`. This option tends to have a significant impact on foreground latency due to high CPU usage, thus it should be included in this test which measures CPU usage and foreground latency while changefeeds are running. Release note: None 92293: sql: add TenantStatusServer.NodeLocality to support crdb_internal.ranges{_no_leases} for secondary tenants r=knz a=ecwall This PR adds a new node locality lookup KV admin function which is used by crdb_internal.ranges{_no_leases} and SHOW RANGES to work for secondary tenants. Release note: None Epic: CRDB-14522 92348: ci: add script to open pr to update bazel builder version r=rail a=healthy-pod Release note: None Part of: CRDB-11061 92470: admission: export elastic CPU utilization % as a metric, natively r=irfansharif a=irfansharif Fixes #89814. With the elastic CPU limiter (#86638) closely tracking acquired/returned CPU nanoseconds, it's possible to compute what % CPU utilization it's nominally overseeing. In experimentation we've been using prometheus where this CPU % is being computed using: ( rate(admission_elastic_cpu_acquired_nanos[$__rate_interval]) - rate(admission_elastic_cpu_acquired_nanos[$__rate_interval]) ) / admission_elastic_cpu_max_available_nanos This timeseries math is not possible in CRDB natively, but it's a useful one to have to observe the CPU limiter in action. This PR computes this number within CRDB and exports it as a metric. Below we see the two different forms of this metric, one computed as described above (smoother) and the version introduced in this PR. <img width="800" alt="image" src="https://user-images.githubusercontent.com/10536690/203867144-465e7373-8e40-4090-9772-32109eb70c7c.png"> Release note: None 92572: sqlstats: add rows_written to node_statement_statistics r=matthewtodd a=matthewtodd Fixes #91042 (No release note here since `node_statement_statistics` is not one of the ["Use in production" `crdb_internal` tables](https://www.cockroachlabs.com/docs/stable/crdb-internal.html#tables).) Release note: None 92582: kvcoord: Correctly handle stuck rangefeeds r=miretskiy a=miretskiy Fixes #92570 A watcher responsible for restarting stuck range feeds may incorrectly cancel rangefeed if the downstream event consumer is slow. Release note (bug fix): Fix incorrect cancellation logic when attempting to detect stuck range feeds. 92602: server: include time units in app protos comments r=xinhaoz a=xinhaoz Closes #89976 This commit includes the unit of time in the comments of fields in app_stats protos where the unit of time was not specified. Release note: None 92612: ui: add link on txn insight details fingerprint r=maryliag a=maryliag Previously, the fingerprint id showing on the contention table inside the transaction insights details didn't have a link to the fingerprint details page. This commit adds the link, including the proper setTimeScale to use the start time of the selected transaction. Fixes #91291 https://www.loom.com/share/53055cfc6b494e1bb7d11bba54252b22 Release note (ui change): Add link on fingerprint ID on high contention table inside Transaction Insights Details page. 92617: sql: Fix create partial stats test cases r=michae2 a=faizaanmadhani This commit modifies the create partial statistics test cases to ensure that the full table statistics that are to be used to create a partial statistic exist in the cache. Previously, this was not the case so certain tests had non-deterministic outputs causing failures in stress tests. Resolves: #92495 and #92559 Epic: CRDB-19449 Release note: None 92624: server_test: make TestClusterVersionUpgrade error more informative r=renatolabs a=andyyang890 This patch changes the error returned by TestClusterVersionUpgrade when auto upgrade does not successfully complete to provide the current version of the cluster instead of the old version. Epic: None Release note: None 92631: logictest: ensure lower bound on bytes limit for sqlite r=yuzefovich a=yuzefovich This commit makes it so that `defaultBatchBytesLimit` is set to at least 3KiB when running sqlite logic tests since if that value is too low, the tests can take really long time (in one example with value of 163 bytes it took 25 minutes vs 3 minutes with 2.5KiB value). This is achieved by explicitly updating this single metamorphic value when run with the sqlite target. I chose this option rather than forcing production values knob (which would disable some other randomizations) to have the smallest possible reduction in test coverage while still making the tests fast enough. Fixes: #92534. Release note: None 92643: storage: fix a bug with reverse scans, multiple column families, and max keys r=yuzefovich a=yuzefovich This commit fixes a bug with how we're checking whether the last row has final column family when performing a reverse scan. Previously, we'd incorrectly treat the largest column ID as the "last" one, but with the reverse iteration actually the zeroth column family is the "last" one. As a result, we could return an incomplete row to SQL. However, there is no production impact to this bug because no user at the SQL layer currently satisfies all the conditions: - `WholeRowsOfSize` option must be used. Currently, it is only used by the streamer; - the reverse scan must be requested and `MaxSpanRequestKeys` must be set - neither is currently done by the streamer. Epic: None Release note: None 92666: sqlinstance: make Start async, use in-memory copy of self to bootstrap r=ajwerner a=jaylim-crl This solves bullet 4 of #85737. This commit makes sqlinstance.Start async, and not block until the rangefeed gets started. Epic: [CRDB-18596](https://cockroachlabs.atlassian.net/browse/CRDB-18596) Release note: None 92672: spanconfigreporter: fix up the add-leaktest invocation r=rail a=knz This was causing an error in CI. Release note: None 92678: importer: use 1GiB max-sql-memory in a test r=yuzefovich a=yuzefovich Fixes: #92225. Release note: None Co-authored-by: David Taylor <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Evan Wall <[email protected]> Co-authored-by: healthy-pod <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Matthew Todd <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: Faizaan Madhani <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @baoalvin1, @chengxiong-ruan, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)
docs/RFCS/20220608_split_jobs_system_tables.md
line 130 at r1 (raw file):
Previously, msbutler (Michael Butler) wrote…
per meeting explanation. it could be worth spelling out why the job_status_message table is separated from control_table: this is an attempt to prevent a single row from getting too large (i.e. filling up a single range) and to allow for history of statuses over time.
Done.
docs/RFCS/20220608_split_jobs_system_tables.md
line 427 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What are the ordering semantics of this queue? If two jobs are created for the same object, is there any guarantee about which one gets turned into a job first? Are we using mutual exclusion on some other table (e.g.
system.descriptors
) to prevent incompatible jobs from being queued?
Yes, external. I've moved this whole intake queue section under a disclaimer that it is speculation and will require its own re-review before it is considered, so we can move forward with the well understood and more urgently needed info table now.
docs/RFCS/20220608_split_jobs_system_tables.md
line 427 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is there observability into the state of this queue? Or some other way to prevent its asynchrony from confusing users? For instance, if a user creates a backup that hasn't made its way out of the jobs_intake table, what will they see when they run
SHOW JOBS
?
I don't think you'd see it in SHOW JOBS
until it is in the job table, as SHOW JOBSs should only be jobs that the system actually thinks it is running, i.e. those in the control table.
Maybe a creation statement should poll the queue to see when to return so that creation appears atomic.
We're shelving this whole section to avoid resolving questions like this right now though so we can more forward with the more urgent parts.
docs/RFCS/20220608_split_jobs_system_tables.md
line 450 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
LIMIT 100
nit: this batch size seems large for a low-throughput queue. My intuition is that batches this large will cause more harm than good, which is what we have seen with scheduled jobs execution. Consider dropping to something like 16.
done.
docs/RFCS/20220608_split_jobs_system_tables.md
line 450 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Without any ordering+predicate here, this is a full table scan over all MVCC garbage in the queue. Are we ok with that because we expect this queue to be very low throughput?
I think that's correct about expectation but it deserves considering critically. But re:above, we're not doing this section now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @baoalvin1, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)
docs/RFCS/20220608_split_jobs_system_tables.md
line 72 at r1 (raw file):
Previously, dt (David Taylor) wrote…
we should add first run time here to account for intake lag
did we?
docs/RFCS/20220608_split_jobs_system_tables.md
line 82 at r1 (raw file):
Previously, dt (David Taylor) wrote…
This column might not be required: I believe ideally we'd let the job itself manage its own retries and any backoff between them it thinks it should add, storing that state in its own info rows, so the backoff at the jobs system level would just be if the node running it crashed before the invoked job-specific function returned, then we should wait some interval before trying again to avoid a job of death crashing all nodes in quick succession.
Could you import the list of columns we ended up implementing into the RFC so it reflects the current situation?
docs/RFCS/20220608_split_jobs_system_tables.md
line 139 at r1 (raw file):
Previously, dt (David Taylor) wrote…
Fair. I was hoping to punt on many of the implementation nitty gritty, particularly for the new control stuff, initially; our asap priority is to add the new info table, just as an available opt-in tool for existing jobs to use if they want to, with the overall control overhaul being sketched here just to make sure it'll all fit together when we're done
Maybe it's time to have one or two example queries here. As Aditya's experience has shown, these are not trivial.
docs/RFCS/20220608_split_jobs_system_tables.md
line 190 at r1 (raw file):
Previously, dt (David Taylor) wrote…
We should also limit the max size of a single row.
Did we?
docs/RFCS/20220608_split_jobs_system_tables.md
line 2 at r3 (raw file):
- Feature Name: Splitting Up the Jobs System Tables - Status: draft
Change the status.
docs/RFCS/20220608_split_jobs_system_tables.md
line 5 at r3 (raw file):
- Start Date: 2022-06-08 - Authors: David Taylor, Steven Danna, Yevgeniy Miretskiy - RFC PR: (PR # after acceptance of initial draft)
Please update these header fields.
docs/RFCS/20220608_split_jobs_system_tables.md
line 76 at r3 (raw file):
nodes will do so. 2. `system.job_status_progress` and `system.job_status_message` record human-consumed progress and status changes
nit: period at the end of this list item and the following one.
docs/RFCS/20220608_split_jobs_system_tables.md
line 88 at r3 (raw file):
b) **there may be many rows per job in job_info. ** This change highly severable, allowing it to be implemented gradually: The transition to storing
This change is*
docs/RFCS/20220608_split_jobs_system_tables.md
line 132 at r3 (raw file):
CREATE TABLE system.job_control ( id INT PRIMARY KEY,
nit: maybe replace tabs by spaces.
docs/RFCS/20220608_split_jobs_system_tables.md
line 213 at r3 (raw file):
CREATE TABLE system.job_status_progress (
ditto my previous comment about tabs and importing the most recent schema in the code into the RFC.
docs/RFCS/20220608_split_jobs_system_tables.md
line 326 at r3 (raw file):
CREATE TABLE system.job_info (
ditto: update schema and tabs->spaces
docs/RFCS/20220608_split_jobs_system_tables.md
line 369 at r3 (raw file):
INSERT INTO system.jobs_info SELECT $1, $3, NOW(), $4 WHERE crdb_internal.sql_liveness_is_alive($2) IS TRUE
Did we end up implemetning this? I think we did the claim below instead. It's worth clarifying in the text.
Release note: none.
I've slightly adjusted this to reflect the fact we won't be replacing the jobs table but just extending it, and the other tables have new names. I think this still isn't quite is 100% what we expect to implement or indeed already implemented in some places, but I still think it'll be more useful actually in the repo than hanging out on a branch for another two years so I think we should go ahead and merge as is and can amend more later if we're so inclined (though plenty of our RFCs aren't 1:1 with the code by the time it merges anyway). |
TFTR! bors r+ |
Release note: none.
Epic: none.