Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix post-reload trigger. #5104

Merged
merged 7 commits into from
Sep 15, 2022
Merged

Fix post-reload trigger. #5104

merged 7 commits into from
Sep 15, 2022

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Aug 26, 2022

Close #5102 - retriggering a failed task after reload should not cause weird job submission errors.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PRs raised to both master and the relevant maintenance branch.

@hjoliver hjoliver added the bug Something is wrong :( label Aug 26, 2022
@hjoliver hjoliver added this to the 8.0.2 milestone Aug 26, 2022
@hjoliver hjoliver requested a review from dpmatthews August 26, 2022 06:21
@hjoliver hjoliver self-assigned this Aug 26, 2022
@hjoliver
Copy link
Member Author

Works fine for the test case in back-compat and normal mode. I'm not sure why it doesn't affect both modes, on master.

@oliver-sanders
Copy link
Member

I'm not to sure about this one, definitely don't understand it yet.

Given this issue has cropped up at least twice I'm worried that this might not be the last we see of it.

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not currently fix the problem of the second submitted job having the old pre-reload config

@hjoliver
Copy link
Member Author

This does not currently fix the problem of the second submitted job having the old pre-reload config

Sorry, there was a second small change I'd neglected to commit and push to GH. It fixes what I think is probably the bug causing mysterious reload issues.

(Disclaimer: not extensively tested beyond the original example yet).

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, have confirmed Dave's example now works as expected

@MetRonnie MetRonnie linked an issue Aug 30, 2022 that may be closed by this pull request
@hjoliver hjoliver marked this pull request as ready for review September 7, 2022 05:53
@hjoliver
Copy link
Member Author

hjoliver commented Sep 7, 2022

@oliver-sanders - I think @MetRonnie is away - can you review or reassign, and consider getting this into 8.0.2?

@oliver-sanders
Copy link
Member

I'm worried that this big will keep resurfacing in different places since it's already cropped up a few times.

I took a look into re-implementing reload to update the tdef and modify the TaskProxy in place. This seems promising, however, is risky, too risky for 8.0.2, requires more thought.

I think this fix makes sense for now, from Ronnie's comment it looks like the test is not capturing the issue though?

@oliver-sanders oliver-sanders modified the milestones: cylc-8.0.2, cylc-8.0.3 Sep 12, 2022
@hjoliver
Copy link
Member Author

I'm worried that this big will keep resurfacing in different places since it's already cropped up a few times.

It's possible that other bugs will crop up that affect reload, of course. But this PR definitely fixes this particular bug.

@oliver-sanders
Copy link
Member

oliver-sanders commented Sep 15, 2022

It's possible that other bugs will crop up that affect reload, of course. But this PR definitely fixes this particular bug.

FYI the source of this bug is exactly the same as for previous reload fixes (i.e. two copies of the task, one pre and one post reload, one modified, one not) - #5102 (comment). This issue has already surfaced and been fixed in a couple of other places, my concern is that it could continue to crop up in more contexts.

For 8.0.x this is the right fix, anything else is time consuming and risky.

For 8.1.0 we should consider changing the reload implementation to modify the TaskProxy in place which would resolve the source of this issue.

@oliver-sanders oliver-sanders merged commit 3554401 into cylc:8.0.x Sep 15, 2022
@hjoliver
Copy link
Member Author

For 8.1.0 we should consider changing the reload implementation to modify the TaskProxy in place which would resolve the source of this issue.

Yes, point taken, but this PR fixes what is the fundamental bug in the current implementation: it sometimes caused the old task proxy not to be swapped out for the new one.

If we get a chance to reimplement for 8.1.0, good - otherwise this'll have to do.

I need to remind myself of why it is not done in place ...

@hjoliver hjoliver deleted the fix-5102 branch September 15, 2022 21:29
@oliver-sanders
Copy link
Member

From experimentation I can't see any reason for it not to be done in place, it's actually simpler.

The main thing to watch out for is cases where derived values are computed from the TaskDef and ensure they are updated by the reload.

wxtim added a commit to wxtim/cylc that referenced this pull request Oct 3, 2022
…l-failure.bk

* upstream/8.0.x:
  remote: ensure all remote commands use a platform config (cylc#5152)
  Db store force triggered (cylc#5023)
  Run GH Actions tests on push to `8.*.x` branches
  Auto bump dev version on release
  remote-install: add "ana/" to the default install list (cylc#5137)
  A no-flow task should not merge and retrigger incomplete children (cylc#5146)
  `log_vc_info`: Redirect diff straight to file to avoid blocking pipe (cylc#5139)
  fix reversed data-store edge source-target (cylc#5156)
  Fix Jinja2 support if HOME undefined.
  Assume Jinja2 might be used in global-tests.cylc.
  Fix post-reload trigger. (cylc#5104)
  Bump dev version
  Update changelog
  workflow_state xtrigger: infer run num
  Type annotations
  Prepare release 8.0.2
  Remove HOME Env Variable from get_remote_workflow_run_dir (cylc#5115)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reload broken in compatibility mode
3 participants