-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix post-reload trigger. #5104
Fix post-reload trigger. #5104
Conversation
Works fine for the test case in back-compat and normal mode. I'm not sure why it doesn't affect both modes, on master. |
I'm not to sure about this one, definitely don't understand it yet. Given this issue has cropped up at least twice I'm worried that this might not be the last we see of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not currently fix the problem of the second submitted job having the old pre-reload config
Sorry, there was a second small change I'd neglected to commit and push to GH. It fixes what I think is probably the bug causing mysterious reload issues. (Disclaimer: not extensively tested beyond the original example yet). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, have confirmed Dave's example now works as expected
@oliver-sanders - I think @MetRonnie is away - can you review or reassign, and consider getting this into 8.0.2? |
tests/functional/reload/27-stall-retrigger/bin/stall-handler.sh
Outdated
Show resolved
Hide resolved
I'm worried that this big will keep resurfacing in different places since it's already cropped up a few times. I took a look into re-implementing reload to update the tdef and modify the TaskProxy in place. This seems promising, however, is risky, too risky for 8.0.2, requires more thought. I think this fix makes sense for now, from Ronnie's comment it looks like the test is not capturing the issue though? |
It's possible that other bugs will crop up that affect reload, of course. But this PR definitely fixes this particular bug. |
FYI the source of this bug is exactly the same as for previous reload fixes (i.e. two copies of the task, one pre and one post reload, one modified, one not) - #5102 (comment). This issue has already surfaced and been fixed in a couple of other places, my concern is that it could continue to crop up in more contexts. For 8.0.x this is the right fix, anything else is time consuming and risky. For 8.1.0 we should consider changing the reload implementation to modify the TaskProxy in place which would resolve the source of this issue. |
Yes, point taken, but this PR fixes what is the fundamental bug in the current implementation: it sometimes caused the old task proxy not to be swapped out for the new one. If we get a chance to reimplement for 8.1.0, good - otherwise this'll have to do. I need to remind myself of why it is not done in place ... |
From experimentation I can't see any reason for it not to be done in place, it's actually simpler. The main thing to watch out for is cases where derived values are computed from the |
…l-failure.bk * upstream/8.0.x: remote: ensure all remote commands use a platform config (cylc#5152) Db store force triggered (cylc#5023) Run GH Actions tests on push to `8.*.x` branches Auto bump dev version on release remote-install: add "ana/" to the default install list (cylc#5137) A no-flow task should not merge and retrigger incomplete children (cylc#5146) `log_vc_info`: Redirect diff straight to file to avoid blocking pipe (cylc#5139) fix reversed data-store edge source-target (cylc#5156) Fix Jinja2 support if HOME undefined. Assume Jinja2 might be used in global-tests.cylc. Fix post-reload trigger. (cylc#5104) Bump dev version Update changelog workflow_state xtrigger: infer run num Type annotations Prepare release 8.0.2 Remove HOME Env Variable from get_remote_workflow_run_dir (cylc#5115)
Close #5102 - retriggering a failed task after reload should not cause weird job submission errors.
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
andconda-environment.yml
.CHANGES.md
entry included if this is a change that can affect users