Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/rfcs: split up jobs system tables #82638

Merged
merged 1 commit into from
Sep 17, 2024
Merged

Conversation

dt
Copy link
Member

@dt dt commented Jun 8, 2022

Release note: none.

Epic: none.

@dt dt requested review from stevendanna, ajwerner and miretskiy June 8, 2022 22:19
@dt dt requested a review from a team as a code owner June 8, 2022 22:19
@cockroach-teamcity
Copy link
Member

This change is Reviewable

# claim/backoff fields determine who should/when to try execute this job
claim_session_id BYTES,
claim_instance_id INT,
backoff_num_attempts INT,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This column might not be required: I believe ideally we'd let the job itself manage its own retries and any backoff between them it thinks it should add, storing that state in its own info rows, so the backoff at the jobs system level would just be if the node running it crashed before the invoked job-specific function returned, then we should wait some interval before trying again to avoid a job of death crashing all nodes in quick succession.


#### SHOW JOBS

The job control table, along with these tables plus the intake table, defines the set of jobs the system is running, will run or has has failed or creased running, along with human-readable information about them, such as their description, percentage complete, or recent status messages; taken together, these what an operator wants to know when they ask "what jobs is my cluster running? How complete are the job I was running? Did they fail and if so why?" This interaction is fulfilled by SHOW JOBS and the jobs page on the DB console. Both of these are powered by an internal virtual table crdb_internal.jobs, and will continue to be going forward, however this table will now be reading columns only from the control table and the status tables (plus the intake table). Rendering the virtual table will _not_ require decoding large proto payloads, that also contain large amounts of jon-specific state, nor will it block on transactions that are manipulating that state, as the virtual table will not depend on the info table, described below, where that state now resides and is manipulated.
Copy link
Collaborator

@msbutler msbutler Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the jobs_control table description above, you say, "the [job control table] is exclusively used by the jobs system itself; no user-controlled or job-controlled transactions are allowed in this table". But here, you're saying the SHOW JOBS requires reading from the control table. Are these contradictory statements, or am I confused?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in meeting: " user-controlled or job-controlled transactions" implies an explicit transaction. Implicit txns on the control table from SHOW JOBS, are still cool.

Open question: can SHOW JOBS get wrapped in a user's explicit txn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOW JOBS, PAUSE JOB, etc are all within the scope of the "jobs system" in my book, as compared to, say, a backup job's individual planning or checkpoint code.

SHOW JOBS can be, but it's a read, so I think that's fine

)
CREATE TABLE system.job_status_message (
job_id INT NOT NULL,
last_updated TIMESTAMPTZ,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per meeting explanation. it could be worth spelling out why the job_status_message table is separated from control_table: this is an attempt to prevent a single row from getting too large (i.e. filling up a single range) and to allow for history of statuses over time.


#### SHOW JOBS

The job control table, along with these tables plus the intake table, defines the set of jobs the system is running, will run or has has failed or creased running, along with human-readable information about them, such as their description, percentage complete, or recent status messages; taken together, these what an operator wants to know when they ask "what jobs is my cluster running? How complete are the job I was running? Did they fail and if so why?" This interaction is fulfilled by SHOW JOBS and the jobs page on the DB console. Both of these are powered by an internal virtual table crdb_internal.jobs, and will continue to be going forward, however this table will now be reading columns only from the control table and the status tables (plus the intake table). Rendering the virtual table will _not_ require decoding large proto payloads, that also contain large amounts of jon-specific state, nor will it block on transactions that are manipulating that state, as the virtual table will not depend on the info table, described below, where that state now resides and is manipulated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in meeting: " user-controlled or job-controlled transactions" implies an explicit transaction. Implicit txns on the control table from SHOW JOBS, are still cool.

Open question: can SHOW JOBS get wrapped in a user's explicit txn.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not intimately familiar with all of the problems with the singular system.jobs table, but found the rationale behind the separation into multiple tables understandable. I'll leave it to others to give official sign-off on this RFC.

```
CREATE TABLE system.job_info (
job_id INT NOT NULL,
info_id STRING NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info_id -> info_name / info_description?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @petermattis, and @stevendanna)


docs/RFCS/20220608_split_jobs_system_tables.md line 11 at r2 (raw file):

The background job execution framework within CockroachDB currently stores its state and the state
of the jobs it runs in columns of the `system.jobs` table. 

What is the effect of this RFC on the scheduled jobs infrastructure? It's not mentioned, so it doesn't sound like we're proposing any changes to the scheduled_jobs table itself. Do the benefits listed here translate to scheduled jobs execution? I imagine some do. On the other hand, scheduled_jobs.next_run is still mutable, so this may not solve all of the contention issues we've seen during scheduled jobs execution.


docs/RFCS/20220608_split_jobs_system_tables.md line 73 at r2 (raw file):

The current system.jobs table will be split into five separate tables: 

1. `system.job_control` tracks the set of jobs, what step they need to take and coordinates which

Is now the time to consider introducing schemas into the system database to organize related tables under a set of high-level functions?


docs/RFCS/20220608_split_jobs_system_tables.md line 81 at r2 (raw file):

4. `system.job_intake` is used to sidestep creation transaction contention in the control table.

Each of these four tables serves a specific purpose and is described in more detail below, but two

nit: "but the two"


docs/RFCS/20220608_split_jobs_system_tables.md line 229 at r2 (raw file):

system is running, will run or has has failed or creased running, along with human-readable
information about them, such as their description, percentage complete, or  recent status messages;
taken together, these what an operator wants to know when they ask "what jobs is my cluster running?

nit: "these are what"


docs/RFCS/20220608_split_jobs_system_tables.md line 235 at r2 (raw file):

reading columns only from the control table and the status tables (plus the intake table). Rendering
the virtual table will _not_ require decoding large proto payloads, that also contain large amounts
of jon-specific state, nor will it block on transactions that are manipulating that state, as the

nit: job-specific


docs/RFCS/20220608_split_jobs_system_tables.md line 345 at r2 (raw file):

giant, and such cases should degrade gracefully rather than make the jobs tables unavailable.

Thus, like the status tables above, updates to keys in this info table will instead be an insert of

Implementing MVCC over SQL solely for the purpose of allowing range splits between versions of the row feels unfortunate. That said, you make a strong argument for doing so, especially if this project lands while the MVCC GC TTL on this table is still 25h. How does this project sequence with the line of work to drop the default GC TTL? If we don't want one to depend on the other, could we use a lower default GC TTL specifically on this table, like we do for system.public.replication_stats?


docs/RFCS/20220608_split_jobs_system_tables.md line 419 at r2 (raw file):

the only rows skipped by such a SKIP LOCKED query are only new rows

And rows in the process of being consumed from the queue.


docs/RFCS/20220608_split_jobs_system_tables.md line 427 at r2 (raw file):

CREATE TABLE system.job_intake (

What are the ordering semantics of this queue? If two jobs are created for the same object, is there any guarantee about which one gets turned into a job first? Are we using mutual exclusion on some other table (e.g. system.descriptors) to prevent incompatible jobs from being queued?


docs/RFCS/20220608_split_jobs_system_tables.md line 427 at r2 (raw file):

CREATE TABLE system.job_intake (

Is there observability into the state of this queue? Or some other way to prevent its asynchrony from confusing users? For instance, if a user creates a backup that hasn't made its way out of the jobs_intake table, what will they see when they run SHOW JOBS?


docs/RFCS/20220608_split_jobs_system_tables.md line 428 at r2 (raw file):

CREATE TABLE system.job_intake (
id INT PRIMARY KEY,

Is this a sequence? How do we generate new IDs?


docs/RFCS/20220608_split_jobs_system_tables.md line 450 at r2 (raw file):

WHERE id IN (
    SELECT * FROM system.job_intake
    FOR UPDATE SKIP LOCKED LIMIT 100;

Without any ordering+predicate here, this is a full table scan over all MVCC garbage in the queue. Are we ok with that because we expect this queue to be very low throughput?


docs/RFCS/20220608_split_jobs_system_tables.md line 450 at r2 (raw file):

LIMIT 100

nit: this batch size seems large for a low-throughput queue. My intuition is that batches this large will cause more harm than good, which is what we have seen with scheduled jobs execution. Consider dropping to something like 16.

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)


docs/RFCS/20220608_split_jobs_system_tables.md line 345 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Implementing MVCC over SQL solely for the purpose of allowing range splits between versions of the row feels unfortunate. That said, you make a strong argument for doing so, especially if this project lands while the MVCC GC TTL on this table is still 25h. How does this project sequence with the line of work to drop the default GC TTL? If we don't want one to depend on the other, could we use a lower default GC TTL specifically on this table, like we do for system.public.replication_stats?

Lowering the TTL on this table won't save us since it needs to be backed up, so even if the default TTL is 5min, if the backup cadence is hourly, the protected timestamp will keep it for an hour until that backup runs.

If someone has an 8mb proto they're stuffing into an info record and revise it every 30s, then an hourly backup cadence means we have at least 120 revisions, so almost 2x the range size. Of course it'd be better to not make a single, giant 8mb proto, but I sorta want to be pessimistic here and assume that'll happen one way or another and just defend against it.

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduling node: only the job_info table is targeting inclusion in 22.2, where it'll be available for opt-in use by the code of existing jobs. The new control, status/progress and intake tables and migrating the job system to them is slated for prototyping (via a standalone simulator app) during the stability period to then be ready for implementation in 23.1.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)


docs/RFCS/20220608_split_jobs_system_tables.md line 179 at r1 (raw file):

Previously, msbutler (Michael Butler) wrote…

nit: info_type is specified in the index

removed


docs/RFCS/20220608_split_jobs_system_tables.md line 11 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What is the effect of this RFC on the scheduled jobs infrastructure? It's not mentioned, so it doesn't sound like we're proposing any changes to the scheduled_jobs table itself. Do the benefits listed here translate to scheduled jobs execution? I imagine some do. On the other hand, scheduled_jobs.next_run is still mutable, so this may not solve all of the contention issues we've seen during scheduled jobs execution.

No touching schedules directly, no. Given the layering boundaries we can just lump schedules in with all other creators of jobs, so the changes to have creator txns not cause job system contention should benefit whether those jobs are created by schedules or humans, but nothing unique to schedules being proposed here.


docs/RFCS/20220608_split_jobs_system_tables.md line 73 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is now the time to consider introducing schemas into the system database to organize related tables under a set of high-level functions?

Eh, I don't know if it really matters; system tables are for use by code in the db, not browsing by humans, so I don't really know if it is worth the added complexity to mess with that just for the extra level of labeling hierarchy?


docs/RFCS/20220608_split_jobs_system_tables.md line 321 at r2 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

info_id -> info_name / info_description?

how about info_key?


docs/RFCS/20220608_split_jobs_system_tables.md line 428 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this a sequence? How do we generate new IDs?

Callers creating jobs pick them, same as they do job ids today (generally using the unique_rowid logic)

@shermanCRL shermanCRL requested a review from baoalvin1 November 7, 2022 18:52
craig bot pushed a commit that referenced this pull request Nov 29, 2022
…92617 #92624 #92631 #92643 #92666 #92672 #92678

84866: jobs: introduce system.job_info table and helpers r=dt a=dt

See commits.

Design: #82638

Epic: None.

91554: cdc: add elastic CPU control to CDC event processing r=jayshrivastava a=jayshrivastava

### admission: make Pacer type available in SQL server config
Currently, the Pacer type is only used within KV, but will be used by SQL
in future changes. For example, code for encoding/decoding CDC events resides
in distSQL and is CPU intensive, so there is a plan to integrate admission
control to it in (#90089).
This change makes the Pacer type available to the SQL layer via the
`execinfra.ServerConfig`.

Because the Pacer was previously only used by KV, it lived in the `kvadmission`
package. Since this change makes it available outside of KV, it is moved to
the `admission` package.

Furthermore, this change adds a new method,
`ElasticCPUGrantCoordinator.NewPacer`, to instantiate new Pacer structs.
Since the `ElasticCPUGrantCoordinator` implements several features not relevant
to SQL, this change passes the coordinator to the SQL server config as
the interface `PacerMaker`, which makes only the `NewPacer` method accessible.

Currently tenant servers do not create grant coordinators for admission
control. This change retains that behavior, except it passes a `nil`
`ElasticCPUGrandCoordinator` which creates `nil`/noop Pacers. Adding these
coordinators to tenant servers is a change outside the scope of this commit and
is left as a `TODO`.

Release note: None

### cdc: add elastic CPU control to CDC event processing
Previously, the CPU-bound work of CDC event processing (encoding /
decoding rows) had the potential to consume a lot of CPU and
disrupt foreground SQL traffic. This changes adds elastic CPU control
to event processing so that it does not use excessive CPU and
starve foreground traffic.

This change also adds two new, non-public cluster settings, which control
enabling/disabling CPU control for CDC event processing and controlling
the requested grant size measured in CPU time.

Fixes: #90089

Release note: None

### roachtest: add initial scan only case to elastic cdc
Previously, this roachtest would not test changefeeds running with
`initial_scan_only`. This option tends to have a significant impact on
foreground latency due to high CPU usage, thus it should be included
in this test which measures CPU usage and foreground latency while
changefeeds are running.

Release note: None

92293: sql: add TenantStatusServer.NodeLocality to support crdb_internal.ranges{_no_leases} for secondary tenants r=knz a=ecwall

This PR adds a new node locality lookup KV admin function which is
used by crdb_internal.ranges{_no_leases} and SHOW RANGES to
work for secondary tenants.

Release note: None

Epic: CRDB-14522

92348: ci: add script to open pr to update bazel builder version r=rail a=healthy-pod

Release note: None
Part of: CRDB-11061

92470: admission: export elastic CPU utilization % as a metric, natively r=irfansharif a=irfansharif

Fixes #89814. With the elastic CPU limiter (#86638) closely tracking acquired/returned CPU nanoseconds, it's possible to compute what % CPU utilization it's nominally overseeing. In experimentation we've been using prometheus where this CPU % is being computed using:

    (
      rate(admission_elastic_cpu_acquired_nanos[$__rate_interval]) -
      rate(admission_elastic_cpu_acquired_nanos[$__rate_interval])
    ) / admission_elastic_cpu_max_available_nanos

This timeseries math is not possible in CRDB natively, but it's a useful one to have to observe the CPU limiter in action. This PR computes this number within CRDB and exports it as a metric. Below we see the two different forms of this metric, one computed as described above (smoother) and the version introduced in this PR.

<img width="800" alt="image" src="https://user-images.githubusercontent.com/10536690/203867144-465e7373-8e40-4090-9772-32109eb70c7c.png">


Release note: None

92572: sqlstats: add rows_written to node_statement_statistics r=matthewtodd a=matthewtodd

Fixes #91042

(No release note here since `node_statement_statistics` is not one of the ["Use in production" `crdb_internal` tables](https://www.cockroachlabs.com/docs/stable/crdb-internal.html#tables).)

Release note: None

92582: kvcoord: Correctly handle stuck rangefeeds r=miretskiy a=miretskiy

Fixes #92570

A watcher responsible for restarting stuck range feeds may incorrectly cancel rangefeed if the downstream event consumer is slow.

Release note (bug fix): Fix incorrect cancellation logic when attempting to detect stuck range feeds.

92602: server: include time units in app protos comments r=xinhaoz a=xinhaoz

Closes #89976

This commit includes the unit of time in the comments of fields in app_stats protos where the unit of time was not specified.

Release note: None

92612: ui: add link on txn insight details fingerprint r=maryliag a=maryliag

Previously, the fingerprint id showing on the
contention table inside the transaction insights details didn't have a link to the fingerprint details page. This commit adds the link, including the proper setTimeScale to use the start time of the selected transaction.

Fixes #91291

https://www.loom.com/share/53055cfc6b494e1bb7d11bba54252b22

Release note (ui change): Add link on fingerprint ID on high contention table inside Transaction Insights Details page.

92617: sql: Fix create partial stats test cases r=michae2 a=faizaanmadhani

This commit modifies the create partial statistics test cases to ensure that the full table statistics that are to be used to create a partial statistic exist in the cache. Previously, this was not the case so certain tests had non-deterministic outputs causing failures in stress tests.

Resolves: #92495 and #92559

Epic: CRDB-19449

Release note: None

92624: server_test: make TestClusterVersionUpgrade error more informative r=renatolabs a=andyyang890

This patch changes the error returned by TestClusterVersionUpgrade
when auto upgrade does not successfully complete to provide the
current version of the cluster instead of the old version.

Epic: None

Release note: None

92631: logictest: ensure lower bound on bytes limit for sqlite r=yuzefovich a=yuzefovich

This commit makes it so that `defaultBatchBytesLimit` is set to at least 3KiB when running sqlite logic tests since if that value is too low, the tests can take really long time (in one example with value of 163 bytes it took 25 minutes vs 3 minutes with 2.5KiB value). This is achieved by explicitly updating this single metamorphic value when run with the sqlite target. I chose this option rather than forcing production values knob (which would disable some other randomizations) to have the smallest possible reduction in test coverage while still making the tests fast enough.

Fixes: #92534.

Release note: None

92643: storage: fix a bug with reverse scans, multiple column families, and max keys r=yuzefovich a=yuzefovich

This commit fixes a bug with how we're checking whether the last row has final column family when performing a reverse scan. Previously, we'd incorrectly treat the largest column ID as the "last" one, but with the reverse iteration actually the zeroth column family is the "last" one. As a result, we could return an incomplete row to SQL.

However, there is no production impact to this bug because no user at the SQL layer currently satisfies all the conditions:
- `WholeRowsOfSize` option must be used. Currently, it is only used by the streamer;
- the reverse scan must be requested and `MaxSpanRequestKeys` must be set - neither is currently done by the streamer.

Epic: None

Release note: None

92666: sqlinstance: make Start async, use in-memory copy of self to bootstrap r=ajwerner a=jaylim-crl

This solves bullet 4 of #85737. This commit makes sqlinstance.Start async, and
not block until the rangefeed gets started.

Epic: [CRDB-18596](https://cockroachlabs.atlassian.net/browse/CRDB-18596)

Release note: None


92672: spanconfigreporter: fix up the add-leaktest invocation r=rail a=knz

This was causing an error in CI.

Release note: None

92678: importer: use 1GiB max-sql-memory in a test r=yuzefovich a=yuzefovich

Fixes: #92225.

Release note: None

Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: healthy-pod <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: Faizaan Madhani <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Jay <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @baoalvin1, @chengxiong-ruan, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)


docs/RFCS/20220608_split_jobs_system_tables.md line 130 at r1 (raw file):

Previously, msbutler (Michael Butler) wrote…

per meeting explanation. it could be worth spelling out why the job_status_message table is separated from control_table: this is an attempt to prevent a single row from getting too large (i.e. filling up a single range) and to allow for history of statuses over time.

Done.


docs/RFCS/20220608_split_jobs_system_tables.md line 427 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What are the ordering semantics of this queue? If two jobs are created for the same object, is there any guarantee about which one gets turned into a job first? Are we using mutual exclusion on some other table (e.g. system.descriptors) to prevent incompatible jobs from being queued?

Yes, external. I've moved this whole intake queue section under a disclaimer that it is speculation and will require its own re-review before it is considered, so we can move forward with the well understood and more urgently needed info table now.


docs/RFCS/20220608_split_jobs_system_tables.md line 427 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is there observability into the state of this queue? Or some other way to prevent its asynchrony from confusing users? For instance, if a user creates a backup that hasn't made its way out of the jobs_intake table, what will they see when they run SHOW JOBS?

I don't think you'd see it in SHOW JOBS until it is in the job table, as SHOW JOBSs should only be jobs that the system actually thinks it is running, i.e. those in the control table.
Maybe a creation statement should poll the queue to see when to return so that creation appears atomic.

We're shelving this whole section to avoid resolving questions like this right now though so we can more forward with the more urgent parts.


docs/RFCS/20220608_split_jobs_system_tables.md line 450 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

LIMIT 100

nit: this batch size seems large for a low-throughput queue. My intuition is that batches this large will cause more harm than good, which is what we have seen with scheduled jobs execution. Consider dropping to something like 16.

done.


docs/RFCS/20220608_split_jobs_system_tables.md line 450 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Without any ordering+predicate here, this is a full table scan over all MVCC garbage in the queue. Are we ok with that because we expect this queue to be very low throughput?

I think that's correct about expectation but it deserves considering critically. But re:above, we're not doing this section now.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @baoalvin1, @chengxiong-ruan, @dt, @miretskiy, @msbutler, @nvanbenschoten, @petermattis, and @stevendanna)


docs/RFCS/20220608_split_jobs_system_tables.md line 72 at r1 (raw file):

Previously, dt (David Taylor) wrote…

we should add first run time here to account for intake lag

did we?


docs/RFCS/20220608_split_jobs_system_tables.md line 82 at r1 (raw file):

Previously, dt (David Taylor) wrote…

This column might not be required: I believe ideally we'd let the job itself manage its own retries and any backoff between them it thinks it should add, storing that state in its own info rows, so the backoff at the jobs system level would just be if the node running it crashed before the invoked job-specific function returned, then we should wait some interval before trying again to avoid a job of death crashing all nodes in quick succession.

Could you import the list of columns we ended up implementing into the RFC so it reflects the current situation?


docs/RFCS/20220608_split_jobs_system_tables.md line 139 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Fair. I was hoping to punt on many of the implementation nitty gritty, particularly for the new control stuff, initially; our asap priority is to add the new info table, just as an available opt-in tool for existing jobs to use if they want to, with the overall control overhaul being sketched here just to make sure it'll all fit together when we're done

Maybe it's time to have one or two example queries here. As Aditya's experience has shown, these are not trivial.


docs/RFCS/20220608_split_jobs_system_tables.md line 190 at r1 (raw file):

Previously, dt (David Taylor) wrote…

We should also limit the max size of a single row.

Did we?


docs/RFCS/20220608_split_jobs_system_tables.md line 2 at r3 (raw file):

- Feature Name: Splitting Up the Jobs System Tables
- Status: draft

Change the status.


docs/RFCS/20220608_split_jobs_system_tables.md line 5 at r3 (raw file):

- Start Date: 2022-06-08
- Authors: David Taylor, Steven Danna, Yevgeniy Miretskiy
- RFC PR: (PR # after acceptance of initial draft)

Please update these header fields.


docs/RFCS/20220608_split_jobs_system_tables.md line 76 at r3 (raw file):

   nodes will do so.
2. `system.job_status_progress` and `system.job_status_message` record human-consumed progress and
   status changes

nit: period at the end of this list item and the following one.


docs/RFCS/20220608_split_jobs_system_tables.md line 88 at r3 (raw file):

b) **there may be many rows per job in job_info. **

This change highly severable, allowing it to be implemented gradually: The transition to storing 

This change is*


docs/RFCS/20220608_split_jobs_system_tables.md line 132 at r3 (raw file):

CREATE TABLE system.job_control (
	id			INT PRIMARY KEY,

nit: maybe replace tabs by spaces.


docs/RFCS/20220608_split_jobs_system_tables.md line 213 at r3 (raw file):

CREATE TABLE system.job_status_progress (

ditto my previous comment about tabs and importing the most recent schema in the code into the RFC.


docs/RFCS/20220608_split_jobs_system_tables.md line 326 at r3 (raw file):

CREATE TABLE system.job_info (

ditto: update schema and tabs->spaces


docs/RFCS/20220608_split_jobs_system_tables.md line 369 at r3 (raw file):

INSERT INTO system.jobs_info SELECT $1, $3, NOW(), $4 WHERE crdb_internal.sql_liveness_is_alive($2) IS TRUE

Did we end up implemetning this? I think we did the claim below instead. It's worth clarifying in the text.

@dt
Copy link
Member Author

dt commented Sep 17, 2024

I've slightly adjusted this to reflect the fact we won't be replacing the jobs table but just extending it, and the other tables have new names. I think this still isn't quite is 100% what we expect to implement or indeed already implemented in some places, but I still think it'll be more useful actually in the repo than hanging out on a branch for another two years so I think we should go ahead and merge as is and can amend more later if we're so inclined (though plenty of our RFCs aren't 1:1 with the code by the time it merges anyway).

@dt
Copy link
Member Author

dt commented Sep 17, 2024

TFTR!

bors r+

@craig craig bot merged commit 40b4ad8 into cockroachdb:master Sep 17, 2024
22 of 23 checks passed
@dt dt deleted the jobs-rfc branch September 25, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants