Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor versioning API #237

Merged
merged 10 commits into from
Mar 13, 2023
Merged

Refactor versioning API #237

merged 10 commits into from
Mar 13, 2023

Conversation

Sushisource
Copy link
Member

What changed?

  • Simpler representation of version "graph"
  • Support id for each compatible set
  • Better support for future WASM bundles

Why?
We need the ID to support separate physical task queues, which is required to avoid a backup in one version starving tasks on another version.

The graph representation was more complex than necessary.

Better support for future WASM bundles, now that we know we will be doing Rust SDK release soonish.

Breaking changes
Haha, yes - to an unreleased API.

@Sushisource Sushisource requested review from a team as code owners September 14, 2022 21:44
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, had a question and a comment that I pretty much resolved myself.

int32 max_depth = 3;
// Limits how many compatible sets will be returned. Specify 1 to only return the current
// default major version set. 0 returns all sets.
int32 max_sets = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit on max versions per set also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno if it's worth it. The most we would return now is 10k versions, if every set is maxed out, and that's still not a gigantor response.

This mostly exist to just have a "get only the current default" toggle.

message CompatibleVersionSet {
// A unique identifier for this version set
string id = 1;
// All the compatible versions, ordered from oldest to newest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find oldest to newest to be confusing since I can use the API to mark a previous version as the default which would not make it the "newest".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will do that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's what you consider "newest", it doesn't read that way to me.
I'd just add the the "default" will alway be the last version in the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, within a set, the order never changes. You only mark overall sets as defaults, and the comment for that is clear about it. There's no reason to have some kind of "inner" default, since all the versions are compatible, we always pick the latest one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, so there's no way to revert marking a build ID as "latest" (or default) in the set?

What if I mark a build as compatible in the set and later change my mind? What if the deployment for that build fails?
How do I tell the server to route requests in that set to a previous build?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a promote-within-set op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that we should have a default attribute on this message to mark the default in the set so we can leave the versions array sorted by insertion time.
Another thing we might want to consider is to store some more information like when was each version added to the set and by who (using identity in the request).

@Sushisource Sushisource force-pushed the versioning-api-refactor branch from 0cced3e to 14e34f2 Compare October 26, 2022 17:16
@Sushisource Sushisource force-pushed the versioning-api-refactor branch from 2f4d2c0 to aa7bd92 Compare November 2, 2022 19:44
@@ -774,6 +773,9 @@ message QueryWorkflowResponse {
message DescribeWorkflowExecutionRequest {
string namespace = 1;
temporal.api.common.v1.WorkflowExecution execution = 2;
// If set, the response will include which build id shall be used to process the pending
// workflow task, if any.
bool include_pending_workflow_task_build_id = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not include this in every response?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more expensive than a normal describe op.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this to populate will_use_build_id? If so this can be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

bool become_default = 5;
}
message UpdateWorkerBuildIdOrderingResponse {}
oneof operation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are exactly what we need but I feel like I'd have to look at the docs every time to know what each of these operations mean.

This is better IMHO:

message AddNewVersion {
  string version = 1;
  string compatible_with_version = 2;
  bool default_for_queue = 3;
}

message PromoteExistingVersion {
  string version = 1;
  bool default_for_queue = 2;
}

There's one case that we might not want to allow: AddNewVersion(version, default_for_queue=false) but I think the readability makes up for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be optimizing our gRPC API for readability over correctness. We make this easy to use / readable in our clients and tctl. The API should optimize for correctness, so that when we implement those things mistakes are easily prevented. Looking at the docstring is quick and easy in every editing environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be optimizing for readability wherever possible TBH.
Notice the consistency in my proposed message names, they start with verbs as one might expect for an operation, they follow the same format, etc.
I'd be down to see an alternative proposal that disallows AddNewVersion(version, default_for_queue=false) and is easier to parse.

Copy link
Member Author

@Sushisource Sushisource Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about relative values, and correctness over readability is I think pretty clear.

But, regardless, I'm really not following what's hard to read about this. I could just make the field names slightly longer and that would seem to do it. I think your version is largely more clear to you because you wrote it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say there are 3 issues with your proposed API:

  1. These 2 variants sound the same, existing_version_id_in_set_to_promote and promote_version_id_within_set.
  2. Some variant names sound like operations while others don't (the ones that don't start with a verb).
  3. The term "default" is overloaded, there's a set default and a queue default, anywhere we mention default we should explicitly disambiguate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the field names a bit more clear to address this

@Sushisource Sushisource force-pushed the versioning-api-refactor branch from aa7bd92 to 9f3d295 Compare February 22, 2023 22:16
Comment on lines 196 to 199
// Version info of the worker who processed this workflow task, or missing if worker is not
// using versioning. If present, the `build_id` field within supersedes `binary_checksum`, which
// may be populated with the same value to preserve compatability.
temporal.api.common.v1.WorkerVersionStamp worker_version = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more clear:

Suggested change
// Version info of the worker who processed this workflow task, or missing if worker is not
// using versioning. If present, the `build_id` field within supersedes `binary_checksum`, which
// may be populated with the same value to preserve compatability.
temporal.api.common.v1.WorkerVersionStamp worker_version = 5;
// Version info of the worker who processed this workflow task, or missing if worker is not
// using versioning. If present, the `build_id` field within is also used as `binary_checksum`,
// which may be omitted in that case. (It also may be populated for backwards compatibility.)
temporal.api.common.v1.WorkerVersionStamp worker_version = 5;

but I'm still a little confused: are we going to populate both indefinitely? how does the server know when it can stop writing binary_checksum into history?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to have it around for some deprecation period before we just eliminate it entirely?

message CompatibleVersionSet {
// A unique identifier for this version set. Users don't need to understand or care about this
// value, but it has value for debugging purposes.
string id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to document anything about this value? I can say that it may not contain / or :. but maybe it's just distracting to document that since it's assigned by the server and clients should treat it as an opaque string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd probably just be distracting

@@ -100,6 +100,9 @@ message PendingWorkflowTaskInfo {
google.protobuf.Timestamp original_scheduled_time = 3 [(gogoproto.stdtime) = true];
google.protobuf.Timestamp started_time = 4 [(gogoproto.stdtime) = true];
int32 attempt = 5;
// If set, this pending workflow task will need to be (or is currently being) handled by a
// a worker using this build id.
string will_use_build_id = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't remember talking about this.. what's the planned use case?
  2. what's the plan to implement it? I don't think it's obvious.
  3. even if this is needed, what do you think about adding the WorkerVersionStamp for the previous wft in WorkflowExecutionInfo? that's going to be in mutable state so it's free to copy in there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. IIRC I thought the reason for existence is to make it easier in the UI / CLI to show what the next task will use, and be able to easily do that "no compatible pollers" warning.... but as I write next that seems to be redundant.
  2. You tell me 🤷 -- if this is hard to do, we can eliminate it since this info is available via the active_versions_and_pollers field in GetWorkerBuildIdCompatabilityResponse. This is technically slightly different since it shows last version / current compatible pollers rather than "next version"... but seems like we should probably just get rid of this unless we remember we need it for some reason and add it then.
  3. Yes, I think this makes sense to add regardless

// set, that value should also be considered as the `binary_checksum`.
temporal.api.taskqueue.v1.VersionId worker_versioning_id = 5;
temporal.api.common.v1.WorkerVersionCapabilities worker_version_capabilities = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there going to be a capability flag for this? how should a versioned worker behave if it connects to a server that doesn't support versioning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is -

bool build_id_based_versioning = 6;

Good question. We hadn't defined that. IMO the right thing to do is for the worker to throw errors on startup (we get capabilities on startup, and if you configured your worker to use versioning but the server can't do it, we can throw then)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a discussion this week we decided we want to be able to turn this on for individual namespaces, so we might need more than capabilities. We could figure that out in another PR

string build_id = 1;
// Set if the worker used a dynamically loadable bundle to process
// the task. The bundle could be a WASM blob, JS bundle, etc.
string bundle_id = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is bundle_id included in build_id, i.e. does build_id definitely change if bundle_id changes? the compatible set stuff is all about build ids, not bundle ids at all, so if one build_id can use multiple bundles then it seems like this won't work

Copy link
Member Author

@Sushisource Sushisource Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was that any worker with a given build_id should be able to load up all bundle_ids that any other compatible worker (as defined by build_id compatibility) would also be able to load. Hence by construction it should slot into the idea of build_id compatibility.

To directly answer your question - no, build_id doesn't necessarily change if bundle_id changes. However, from a compat perspective it should be consistent. EX: If workers A and B have the same (or compatible) build_ids, this scenario can happen/work:

  • A takes task one, replies build id foo and bundle_id b_1
  • B takes next task, build id foo.1 and bundle b_2
  • A takes next task, still has build id foo, but maybe doesn't have b_2, but it can download it and it's expected that works, because foo is compat with foo.1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does A know that it should use b_2 at that point? I know this feature isn't fully designed yet, just trying to make sure we won't have to make incompatible changes here later, e.g. needing to have bundle ids also (i.e. the whole WorkerVersionStamp) where this change just has build ids

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, because it simply looks at the history which includes the version stamp on the last WFT complete, and downloads that bundle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh duh, I'm thinking too server-centric

string build_id = 1;
// Set if the worker used a dynamically loadable bundle to process
// the task. The bundle could be a WASM blob, JS bundle, etc.
string bundle_id = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh duh, I'm thinking too server-centric

@Sushisource Sushisource merged commit 9206f15 into master Mar 13, 2023
@Sushisource Sushisource deleted the versioning-api-refactor branch March 13, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants