-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enum for worker TaskState names #5444
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this @fjetter
This includes a breaking change for the WorkerPlugin since the transition functions will no longer receive strings. We can make this backwards compatible but eventually I'd like to transition this to the enums as well.
Thanks for flagging this. I'd like to add backwards compatibility / deprecation code similar to what was done in #3875
cc @crusaderky as this seems like the kind of this you might enjoy reviewing
distributed/worker.py
Outdated
return self.name | ||
|
||
|
||
WTSName = WorkerTaskStateName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does removing Name
from WorkerTaskStateName
sufficiently reduce the need for this alias? I find using the full name much more readable
distributed/worker.py
Outdated
@@ -97,16 +99,41 @@ | |||
|
|||
no_value = "--no-value-sentinel--" | |||
|
|||
|
|||
class WorkerTaskStateName(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is shorter while, I think, conveying the same information
class WorkerTaskStateName(Enum): | |
class WorkerTaskState(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe WorkerTaskStatus
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this same enum is going to be soon used in scheduler.py too, may I suggest moving it to another module (core.py?) and rename it to e.g. TaskStateStatus (imported as TSS everywhere)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WorkerTaskStatus
well, it's not a status but I don't have a strong opinion there
Since this same enum is going to be soon used
I'm not entirely conviced that this should be shared among scheduler and worker. What I like about using two distinct enums is that the readability improves if there are scheduler and worker states around. The set of allowed states is also different.
v_args = v[1:] | ||
v_state = v[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but I'm curious why this change was needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks cosmetic. I don't mind either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy complains otherwise. below with type annotations
v : WTSName | tuple = recs.get(ts, (finish, *args))
v_state: WTSName
v_args: tuple
if isinstance(v, tuple):
# old; This will raise since the expression *v_args is of type list
v_state: WTSName, *v_args: list = v
# new
v_state: WTSName = v[0]
v_state: tuple = v[1:] # indexing a tuple this way yields a tuple
else:
v_state: WTSName, v_args: tuple = v, ()
distributed/stealing.py
Outdated
"waiting", | ||
WTSName.ready, | ||
WTSName.constrained, | ||
WTSName.waiting, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they're enums now, you could perhaps use a Flag
enum to encapsulate multiple states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the benefit. Tasks themselves are not allowed to have multiple states. Therefore, there is the possibility to have a properly typed value assigned which is semantically nonsense / impossible. That's unlikely but doesn't encourage me in using it.
Therefore the only valid application will be when the rules engine verifies if a state is in a given set of states.
For this, it's basically just another syntax. Below the examples played out. set-like __contains__
still works so this will only be affecting the definition of the subsets.
In [1]: from enum import Enum, Flag, auto
In [2]: class TSSFlag(Flag):
...: fetch = auto()
...: flight = auto()
...: released = auto()
...: executing = auto()
...:
In [3]: class TSS(Enum):
...: fetch = auto()
...: flight = auto()
...: released = auto()
...: executing = auto()
...:
In [4]: to_be_gathered = {TSS.fetch, TSS.flight}
In [6]: to_be_gathered_flag = TSSFlag.fetch | TSSFlag.flight
In [7]: TSS.released in to_be_gathered
Out[7]: False
In [8]: TSS.flight in to_be_gathered
Out[8]: True
In [9]: TSSFlag.released in to_be_gathered_flag
Out[9]: False
In [10]: TSSFlag.flight in to_be_gathered_flag
Out[10]: True
In [11]: bool(TSSFlag.released & to_be_gathered_flag)
Out[11]: False
In [12]: bool(TSSFlag.flight & to_be_gathered_flag)
Out[12]: True
I get that bitwise comparison is faster than hashing but this speed difference is entirely negligible for the worker and I'm not entirely sure if it is relevant for the scheduler.
However, I don't see any significant downsides. If anybody has a reason to do so, I'm open to it but I consider the set-only, plain enum thing to be more "pythonic"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's mostly stylistic. I think checking membership of a set of values is the canonical use case for a flag enum over a regular enum. IIUC Flag
has been around since py36 so I think it would be fairly "pythonic" to use it where you don't have to support older Pythons.
Even though it may be negligible now, if there's no downside to doing so I'd always be inclined to go with the more performant solution as it may save you in future. However, your type-checker won't save you from setting a TSSFlag
attribute to a combined value (e.g. TSSFlag.fetch | TSSFlag.flight
) so that's a fairly big downside 🤔
@jrbourbeau @fjetter Just a random thought: python 3.10 adds StrEnums which might be backwards compatible with this. You can get StrEnums in python 3.6+ with: https://github.com/irgeek/StrEnum I believe (enums sounds good to me either way). |
while key not in dask_worker.tasks or dask_worker.tasks[key].state != state: | ||
await asyncio.sleep(0.005) | ||
|
||
|
||
async def wait_for_cancelled(key, dask_worker): | ||
async def wait_for_cancelled(key: str, dask_worker: Worker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def wait_for_cancelled(key: str, dask_worker: Worker): | |
async def wait_for_cancelled(key: str, dask_worker: Worker) -> None: |
Blocked by and incorporates #5426 |
It's not the same thing. It's not about setting the proper value on the state attribute. If users modify anything on that attribute, they are entirely on their own since I couldn't think of many things which are more low level and internal than this attribute. I'm rather talking about the distributed/distributed/worker.py Line 2292 in 3afc670
Our WorkerPlugin interface currently defines strings. It's very easy for us to simply continue passing string values in there but it is impossible to infer whether the users migrated to anything new or not. We can either raise a deprecation/future warning every time (too much spam; I'm strongly against this) or not warn at all. We might also simply decide to just keep using strings for anything user facing. A similar problem exists for the transition log. Right now, I'm logging the enums but we could instead log the names which might offer a better UX (although I'm not sure if anybody other than myself is actually using the transition log :) ) |
I'm not aware that py3.10 is introducing a new enum type. Looking at https://docs.python.org/3/whatsnew/3.10.html enums are only mentioned in an example regarding the new AFAIK, mixed type enums can be easily created by subclassing, e.g. In [15]: class TSSStr(str, Enum):
...: fetch = "fetch"
...: flight = "flight"
...: released = "released"
...: executing = "executing"
...:
In [16]: TSSStr.fetch == "fetch"
Out[16]: True |
3f0600d
to
2dadf51
Compare
2dadf51
to
bf894cc
Compare
@@ -131,6 +131,8 @@ class WorkerPlugin: | |||
>>> client.register_worker_plugin(plugin) # doctest: +SKIP | |||
""" | |||
|
|||
enum_task_state_names = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a backwards compatibility toggle for the plugin. There is now a class attribute controlling this behaviour and it default to the old behaviour. the old behaviour will then issue a deprecation warning instructing the user how to migrate.
Summarizing
|
ready = "ready" | ||
constrained = "constrained" | ||
executing = "executing" | ||
long_running = "long-running" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to cause headaches with name != value. Can we use _ in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(may need to change on the scheduler side too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it this way for compat purposes. The value is backwards compatible
@@ -3543,7 +3627,31 @@ def get_call_stack(self, comm=None, keys=None): | |||
def _notify_plugins(self, method_name, *args, **kwargs): | |||
for name, plugin in self.plugins.items(): | |||
if hasattr(plugin, method_name): | |||
if method_name == "release_key": | |||
if ( | |||
not getattr(plugin, "enum_task_state_names", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not getattr(plugin, "enum_task_state_names", False) | |
not plugin.enum_task_state_names |
I think getattr is unnecessary - aren't all customer plugins supposed to subclass WorkerPlugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically I don't think we've enforced this. That's why, for example, we have hasattr
checks like this
distributed/distributed/worker.py
Lines 1539 to 1543 in 1a2313c
teardowns = [ | |
plugin.teardown(self) | |
for plugin in self.plugins.values() | |
if hasattr(plugin, "teardown") | |
] |
Enforcing plugins to subclass WorkerPlugin
doesn't sound like a bad idea, but would merit a deprecation cycle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
bf894cc
to
be5c9cb
Compare
|
Is there anything outstanding here? |
Nothing outstanding other than resolving the merge conflicts. So far, this wasn't a priority of mine |
This sets up an enum to be used as the worker task state name. The name of this enum is up for debate, I couldn't find a shorter one since, eventually, we'll have the same for the scheudler and both enums will occasionally be active in the same module. This is, in fact, one reason why this is useful.
This includes a breaking change for the WorkerPlugin since the transition functions will no longer receive strings. We can make this backwards compatible but eventually I'd like to transition this to the enums as well.
Builds on #5426