Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cancelled jobs. #23

Open
wants to merge 190 commits into
base: main
Choose a base branch
from

Conversation

contrains
Copy link
Contributor

This pull request adds support for a cancelled job status. This allows the user to manually cancel a workflow.

Changes include:

  • ADD: JobStatus.CANCELLED.
  • CHANGE: Do not run jobs with a finalised status (equivalent to not Job.should_be_requeued)
  • CHANGE: Make Job.should_be_requeued a class property.
  • CHANGE: Sort Job methods alphabetically.

…. This entails refactoring the `@step` method to be a decorator or a decorator factory, depending on whether the first argument is `callable`.
…ather than direct references to `WorkflowStep` objects.
…f index or label was specified; false otherwise.)
@contrains
Copy link
Contributor Author

@prryplatypus the tests are failing with an error that I'm not sure is related to my code.

ergate/interrupt.py:24: error: Invalid index type "Signals" for "dict[Literal[Signals.SIGTERM, Signals.SIGINT], Callable[[int, FrameType | None], Any] | int | None]"; expected type "Literal[Signals.SIGTERM, Signals.SIGINT]" [index]

Thoughts?

@contrains
Copy link
Contributor Author

Also, idk why there are so many old commits in this PR since I branched off of main after pulling upstream into it.

ergate/job.py Outdated
def mark_aborted(self, message: str) -> None:
self.status = JobStatus.ABORTED

@property
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep this as a method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted it, so that's fine. But, why? It's merely a computed value from another property which doesn't have external calls, so wouldn't it serve well as a property?

Comment on lines 124 to 126
if job.should_be_requeued:
return

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for? This would be True for every job that's just coming through the queue, and would cause the worker to keep stopping 🤔

Copy link
Contributor Author

@contrains contrains Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. I forgot the not!

It's supposed to be if not job.should_be_requeued(), and the purpose of this is to catch jobs which have been set to CANCELLED in between steps, so that it does not process them anyway and then overwrite the CANCELLED status with RUNNING here: https://github.com/contrains/ergeats/blob/main/ergate/job_runner.py#L36

Copy link
Owner

@prryplatypus prryplatypus Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if something gets cancelled in the state store (Netbox) if it doesn't get changed in the queue too, since this job object is obtained directly from the queue. The idea was to avoid an extra call to the state store every time a job was acquired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Where would you recommend this check being made then?

ruff format --check ergate
ruff format --diff ergate
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to undo this? :D

Copy link
Contributor Author

@contrains contrains Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, anyway. ;) I was leaving it in until this issue was resolved: #23 (comment)

Do you have any thoughts about that?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge your PR with that since it does indeed appear unrelated. However I won't release until I get it fixed (possibly in a separate PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my code subsequently is shown to have any issues once that is resolved, I'll plan to make a new PR to address them.

@contrains contrains force-pushed the add-cancelled-job-status branch from 8eb472c to 6e985c3 Compare February 18, 2025 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants