-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor versioning API #237
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, had a question and a comment that I pretty much resolved myself.
int32 max_depth = 3; | ||
// Limits how many compatible sets will be returned. Specify 1 to only return the current | ||
// default major version set. 0 returns all sets. | ||
int32 max_sets = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit on max versions per set also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno if it's worth it. The most we would return now is 10k versions, if every set is maxed out, and that's still not a gigantor response.
This mostly exist to just have a "get only the current default" toggle.
87faccb
to
cb9b18c
Compare
e84fb09
to
e219a57
Compare
message CompatibleVersionSet { | ||
// A unique identifier for this version set | ||
string id = 1; | ||
// All the compatible versions, ordered from oldest to newest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find oldest to newest to be confusing since I can use the API to mark a previous version as the default which would not make it the "newest".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's what you consider "newest", it doesn't read that way to me.
I'd just add the the "default" will alway be the last version in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, within a set, the order never changes. You only mark overall sets as defaults, and the comment for that is clear about it. There's no reason to have some kind of "inner" default, since all the versions are compatible, we always pick the latest one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, so there's no way to revert marking a build ID as "latest" (or default) in the set?
What if I mark a build as compatible in the set and later change my mind? What if the deployment for that build fails?
How do I tell the server to route requests in that set to a previous build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a promote-within-set op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that we should have a default
attribute on this message to mark the default in the set so we can leave the versions
array sorted by insertion time.
Another thing we might want to consider is to store some more information like when was each version added to the set and by who (using identity
in the request).
0cced3e
to
14e34f2
Compare
2f4d2c0
to
aa7bd92
Compare
@@ -774,6 +773,9 @@ message QueryWorkflowResponse { | |||
message DescribeWorkflowExecutionRequest { | |||
string namespace = 1; | |||
temporal.api.common.v1.WorkflowExecution execution = 2; | |||
// If set, the response will include which build id shall be used to process the pending | |||
// workflow task, if any. | |||
bool include_pending_workflow_task_build_id = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not include this in every response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more expensive than a normal describe op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this to populate will_use_build_id
? If so this can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
bool become_default = 5; | ||
} | ||
message UpdateWorkerBuildIdOrderingResponse {} | ||
oneof operation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these are exactly what we need but I feel like I'd have to look at the docs every time to know what each of these operations mean.
This is better IMHO:
message AddNewVersion {
string version = 1;
string compatible_with_version = 2;
bool default_for_queue = 3;
}
message PromoteExistingVersion {
string version = 1;
bool default_for_queue = 2;
}
There's one case that we might not want to allow: AddNewVersion(version, default_for_queue=false)
but I think the readability makes up for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be optimizing our gRPC API for readability over correctness. We make this easy to use / readable in our clients and tctl. The API should optimize for correctness, so that when we implement those things mistakes are easily prevented. Looking at the docstring is quick and easy in every editing environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be optimizing for readability wherever possible TBH.
Notice the consistency in my proposed message names, they start with verbs as one might expect for an operation, they follow the same format, etc.
I'd be down to see an alternative proposal that disallows AddNewVersion(version, default_for_queue=false)
and is easier to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about relative values, and correctness over readability is I think pretty clear.
But, regardless, I'm really not following what's hard to read about this. I could just make the field names slightly longer and that would seem to do it. I think your version is largely more clear to you because you wrote it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say there are 3 issues with your proposed API:
- These 2 variants sound the same,
existing_version_id_in_set_to_promote
andpromote_version_id_within_set
. - Some variant names sound like operations while others don't (the ones that don't start with a verb).
- The term "default" is overloaded, there's a set default and a queue default, anywhere we mention
default
we should explicitly disambiguate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the field names a bit more clear to address this
* Simpler representation of version "graph" * Support id for each compatible set * Support for future WASM bundles
aa7bd92
to
9f3d295
Compare
// Version info of the worker who processed this workflow task, or missing if worker is not | ||
// using versioning. If present, the `build_id` field within supersedes `binary_checksum`, which | ||
// may be populated with the same value to preserve compatability. | ||
temporal.api.common.v1.WorkerVersionStamp worker_version = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more clear:
// Version info of the worker who processed this workflow task, or missing if worker is not | |
// using versioning. If present, the `build_id` field within supersedes `binary_checksum`, which | |
// may be populated with the same value to preserve compatability. | |
temporal.api.common.v1.WorkerVersionStamp worker_version = 5; | |
// Version info of the worker who processed this workflow task, or missing if worker is not | |
// using versioning. If present, the `build_id` field within is also used as `binary_checksum`, | |
// which may be omitted in that case. (It also may be populated for backwards compatibility.) | |
temporal.api.common.v1.WorkerVersionStamp worker_version = 5; |
but I'm still a little confused: are we going to populate both indefinitely? how does the server know when it can stop writing binary_checksum into history?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll have to have it around for some deprecation period before we just eliminate it entirely?
message CompatibleVersionSet { | ||
// A unique identifier for this version set. Users don't need to understand or care about this | ||
// value, but it has value for debugging purposes. | ||
string id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to document anything about this value? I can say that it may not contain /
or :
. but maybe it's just distracting to document that since it's assigned by the server and clients should treat it as an opaque string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd probably just be distracting
@@ -100,6 +100,9 @@ message PendingWorkflowTaskInfo { | |||
google.protobuf.Timestamp original_scheduled_time = 3 [(gogoproto.stdtime) = true]; | |||
google.protobuf.Timestamp started_time = 4 [(gogoproto.stdtime) = true]; | |||
int32 attempt = 5; | |||
// If set, this pending workflow task will need to be (or is currently being) handled by a | |||
// a worker using this build id. | |||
string will_use_build_id = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't remember talking about this.. what's the planned use case?
- what's the plan to implement it? I don't think it's obvious.
- even if this is needed, what do you think about adding the WorkerVersionStamp for the previous wft in WorkflowExecutionInfo? that's going to be in mutable state so it's free to copy in there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- IIRC I thought the reason for existence is to make it easier in the UI / CLI to show what the next task will use, and be able to easily do that "no compatible pollers" warning.... but as I write next that seems to be redundant.
- You tell me 🤷 -- if this is hard to do, we can eliminate it since this info is available via the
active_versions_and_pollers
field inGetWorkerBuildIdCompatabilityResponse
. This is technically slightly different since it shows last version / current compatible pollers rather than "next version"... but seems like we should probably just get rid of this unless we remember we need it for some reason and add it then. - Yes, I think this makes sense to add regardless
// set, that value should also be considered as the `binary_checksum`. | ||
temporal.api.taskqueue.v1.VersionId worker_versioning_id = 5; | ||
temporal.api.common.v1.WorkerVersionCapabilities worker_version_capabilities = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there going to be a capability flag for this? how should a versioned worker behave if it connects to a server that doesn't support versioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is -
bool build_id_based_versioning = 6; |
Good question. We hadn't defined that. IMO the right thing to do is for the worker to throw errors on startup (we get capabilities on startup, and if you configured your worker to use versioning but the server can't do it, we can throw then)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a discussion this week we decided we want to be able to turn this on for individual namespaces, so we might need more than capabilities. We could figure that out in another PR
string build_id = 1; | ||
// Set if the worker used a dynamically loadable bundle to process | ||
// the task. The bundle could be a WASM blob, JS bundle, etc. | ||
string bundle_id = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is bundle_id included in build_id, i.e. does build_id definitely change if bundle_id changes? the compatible set stuff is all about build ids, not bundle ids at all, so if one build_id can use multiple bundles then it seems like this won't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was that any worker with a given build_id should be able to load up all bundle_ids that any other compatible worker (as defined by build_id compatibility) would also be able to load. Hence by construction it should slot into the idea of build_id compatibility.
To directly answer your question - no, build_id doesn't necessarily change if bundle_id changes. However, from a compat perspective it should be consistent. EX: If workers A and B have the same (or compatible) build_ids, this scenario can happen/work:
- A takes task one, replies build id
foo
and bundle_idb_1
- B takes next task, build id
foo.1
and bundleb_2
- A takes next task, still has build id
foo
, but maybe doesn't haveb_2
, but it can download it and it's expected that works, becausefoo
is compat withfoo.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does A know that it should use b_2
at that point? I know this feature isn't fully designed yet, just trying to make sure we won't have to make incompatible changes here later, e.g. needing to have bundle ids also (i.e. the whole WorkerVersionStamp) where this change just has build ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because it simply looks at the history which includes the version stamp on the last WFT complete, and downloads that bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh duh, I'm thinking too server-centric
string build_id = 1; | ||
// Set if the worker used a dynamically loadable bundle to process | ||
// the task. The bundle could be a WASM blob, JS bundle, etc. | ||
string bundle_id = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh duh, I'm thinking too server-centric
What changed?
Why?
We need the ID to support separate physical task queues, which is required to avoid a backup in one version starving tasks on another version.
The graph representation was more complex than necessary.
Better support for future WASM bundles, now that we know we will be doing Rust SDK release soonish.
Breaking changes
Haha, yes - to an unreleased API.