Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/k/hold-release/13-ready-restart.t not working #4052

Closed
MetRonnie opened this issue Jan 28, 2021 · 3 comments · Fixed by #4054
Closed

tests/k/hold-release/13-ready-restart.t not working #4052

MetRonnie opened this issue Jan 28, 2021 · 3 comments · Fixed by #4054
Assignees
Milestone

Comments

@MetRonnie
Copy link
Member

Original issue that this test is for: #958

Related issue: #2610


$ etc/bin/run-functional-tests -v -p '*' tests/flakyfunctional/hold-release/13-ready-restart.t 

ok 1 - 13-ready-restart-validate
ok 2 - 13-ready-restart-run
ok 3 - 13-ready-restart-restart
ERROR: poll timed out: grep -s foo-1\.1.*succeeded ~/cylc-run/cylctb-20210128T164523Z/
    flakyfunctional/hold-release/13-ready-restart/log/suite/log
Dubious, test returned 1 (wstat 256, 0x100)
All 3 subtests passed 
[16:46:38]

Test Summary Report
-------------------
tests/flakyfunctional/hold-release/13-ready-restart.t (Wstat: 256 Tests: 3 Failed: 0)
  Non-zero exit status: 1
Files=1, Tests=3, 74 wallclock secs ( 0.03 usr  0.01 sys +  4.95 cusr  1.98 csys =  6.97 CPU)
Result: FAIL

Suite log 1:

2021-01-28T16:45:36Z INFO - Cold Start 1
...
2021-01-28T16:45:46Z INFO - [foo-1.1] -submit-num=01, host=<localhost>
2021-01-28T16:45:46Z INFO - [foo-1.1] -triggered off ['foo.1']

Suite log 2:

2021-01-28T16:46:00Z INFO - LOADING suite parameters
...
2021-01-28T16:46:00Z INFO - + foo-1.1 ['try_timers', 'submission-retry']
2021-01-28T16:46:00Z INFO - + foo-1.1 ['try_timers', 'execution-retry']
2021-01-28T16:46:00Z INFO - Held on start-up (no tasks will be submitted)
2021-01-28T16:46:00Z INFO - Holding all waiting or queued tasks now
2021-01-28T16:46:00Z INFO - Suite held.
...
2021-01-28T16:46:00Z INFO - Run: (re)start=1 log=1
...
2021-01-28T16:46:02Z ERROR - localhost: initialisation did not complete:
	COMMAND FAILED (255): ssh -oBatchMode=yes -oConnectTimeout=10 localhost env CYLC_VERSION=8.0a3.dev
            CYLC_CONF_PATH=/var/tmp/tmp.1Pj0iZG0HC/etc bash --login -c ''"'"'exec "$0" "$@"'"'"'' cylc remote-init
            localhost '$HOME/cylc-run/cylctb-20210128T164523Z/flakyfunctional/hold-release/13-ready-restart'
	COMMAND STDERR: Host key verification failed.
2021-01-28T16:46:03Z INFO - [client-command] graphql <user>@<localhost>:~/miniconda3/envs/cylc8//bin/cylc
2021-01-28T16:46:03Z INFO - Broadcast set:
	+ [foo-1.1] platform=wobble
2021-01-28T16:46:04Z INFO - [bar.1] status=running (held): (polled)started at 2021-01-28T16:45:42Z  for job(01) flow(s)
2021-01-28T16:46:04Z INFO - [bar.1] -health check settings: execution timeout=None, polling intervals=PT15M,...
2021-01-28T16:46:04Z INFO - [foo-1.1] status=waiting (held): (polled)submission failed at 2021-01-28T16:46:04Z
    for job(01) flow(s)
2021-01-28T16:46:04Z ERROR - [foo-1.1] -submission failed
2021-01-28T16:46:07Z INFO - [client-command] graphql <user>@<localhost>:~/miniconda3/envs/cylc8//bin/cylc
2021-01-28T16:46:07Z INFO - RELEASE: new tasks will be queued when ready
2021-01-28T16:46:07Z INFO - Command succeeded: release(ids=[])
2021-01-28T16:46:07Z INFO - Processing 1 queued command(s)
	+       release(ids=[])
2021-01-28T16:46:09Z INFO - [bar.1] status=running: (received)succeeded at 2021-01-28T16:46:08Z  for job(01) flow(s)
2021-01-28T16:46:09Z WARNING - Suite stalled with unhandled failed tasks:
	* foo-1.1 (submit-failed)
2021-01-28T16:46:40Z WARNING - suite timed out after inactivity for PT30S
2021-01-28T16:46:40Z ERROR - Suite shutting down - Abort on suite inactivity is set
2021-01-28T16:46:40Z INFO - DONE
@MetRonnie MetRonnie added the bug? Not sure if this is a bug or not label Jan 28, 2021
MetRonnie added a commit to MetRonnie/cylc-flow that referenced this issue Jan 28, 2021
(Apart from tests/k/hold-release/13-ready-restart, seems to be broken on 
master: cylc#4052)
@oliver-sanders
Copy link
Member

What does the job activity log for foo-1.1 say? (cylc cat-log <flow> foo-1.1 -f a)

@MetRonnie
Copy link
Member Author

What does the job activity log for foo-1.1 say? (cylc cat-log <flow> foo-1.1 -f a)

[jobs-poll ret_code] 0
[jobs-poll out] 2021-01-28T16:46:04Z|1/foo-1/01|{"job_runner_name": "at", "job_runner_exit_polled": 1}

@oliver-sanders
Copy link
Member

Note: The Cylc7 ready state is/will-be equivalent to the Cylc8 preparing state.

Here's the fix:

diff --git a/tests/flakyfunctional/hold-release/13-ready-restart/flow.cylc b/tests/flakyfunctional/hold-release/13-ready-restart/flow.cylc
index af9609f7c..9326da49e 100644
--- a/tests/flakyfunctional/hold-release/13-ready-restart/flow.cylc
+++ b/tests/flakyfunctional/hold-release/13-ready-restart/flow.cylc
@@ -50,4 +50,5 @@
             # Release the suite to run to completion.
             sleep 2
             cylc release "${CYLC_SUITE_NAME}"
+            cylc trigger "${CYLC_SUITE_NAME}" foo-1.1
         """

However we don't want to apply this fix as the logic of the test has been corrupted. It was intended to stick a task in the ready state at the end of the first run, then restart it.

However the task now goes into the submit-failed state at the end of the first run:

2021-01-29T11:17:07Z WARNING - Suite stalled with unhandled failed tasks:
	* bar.1 (failed)
	* foo-1.1 (submit-failed)
2021-01-29T11:17:37Z WARNING - suite timed out after inactivity for PT30S
2021-01-29T11:17:37Z ERROR - Suite shutting down - Abort on suite inactivity is set
2021-01-29T11:17:37Z INFO - DONE

The DB reflects htis

sqlite> select * from task_pool where name == "foo-1";
1|foo-1|G|submit-failed|0

Yet mysteriously the task proxy is loaded in the restart as waiting (bug?):

2021-01-29T11:16:58Z INFO - LOADING task proxies
2021-01-29T11:16:58Z INFO - + bar.1 running
2021-01-29T11:16:58Z INFO - + foo-1.1 waiting

Before being polled back to submit-failed:

2021-01-29T11:17:00Z INFO - [foo-1.1] status=waiting (held): (polled)submission failed at 2021-01-29T11:17:00Z  for job(01) flow(l)

To get a task into the ready state it might be easier just to fiddle the DB rather than trying to naturally run a flow into the desired state.

Going back to #358:

This is because releasing the suite resets the 'held' tasks to 'waiting', and that causes its prerequisites to be reset back to unsatisfied.

Since #3230 release only changes the is_held attribute of the task state so this bug of old and its test legacy are dud I think.

oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Jan 29, 2021
@oliver-sanders oliver-sanders removed the bug? Not sure if this is a bug or not label Jan 29, 2021
@oliver-sanders oliver-sanders self-assigned this Jan 29, 2021
@oliver-sanders oliver-sanders added this to the cylc-8.0a3 milestone Jan 29, 2021
@hjoliver hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants