You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add to the new troubleshooting section once #638 is merged.
The Cylc UIs show the scheduler's current knowledge of task and job state. For active tasks, that involves interaction with the external world:
a task enters the "submitted" state if the job runner successfully returns a job ID on job submission
then it enters the "running" state if the submitted job returns a "started" status message
and finally, it enters the "succeeded" or "failed" state if the running job returns a corresponding status message
(Note the above assumes TCP job status messaging; otherwise the scheduler periodically polls for job status).
Tasks may get "stuck" in an incorrect state if anything blocks this external job status information. For instance, you may see a task that stays in the "submitted" state even though it actually ran and completed.
Polling the task - by which the scheduler queries the job runner and checks the job.status file - will return the correct result, but you may still need to determine what went wrong.
Incorrect task status implies one of two things:
the job status message was not sent by the job
this implies the job was hard-killed (SIGKILL) or the host went down
(a soft kill or job failure will cause a "failed" status message to be sent before exit)
or the job ran and completed but was unable to send status messages back
this implies network issues blocked the send
or the job could not find the Cylc package on the job host, to send the message
You can determine what happened by examining the job logs:
if the job finished (succeeded or failed) that will be recorded in the job.status log regardless of message send
if message send failed, the job.err log will record errors (this will not stop the job from completing, however)
if the job.status file does not record completion, and the job is no longer present in the job runner queue, then the job must have been hard-killed
The text was updated successfully, but these errors were encountered:
Add to the new troubleshooting section once #638 is merged.
The Cylc UIs show the scheduler's current knowledge of task and job state. For active tasks, that involves interaction with the external world:
(Note the above assumes TCP job status messaging; otherwise the scheduler periodically polls for job status).
Tasks may get "stuck" in an incorrect state if anything blocks this external job status information. For instance, you may see a task that stays in the "submitted" state even though it actually ran and completed.
Polling the task - by which the scheduler queries the job runner and checks the
job.status
file - will return the correct result, but you may still need to determine what went wrong.Incorrect task status implies one of two things:
You can determine what happened by examining the job logs:
job.status
log regardless of message sendjob.err
log will record errors (this will not stop the job from completing, however)job.status
file does not record completion, and the job is no longer present in the job runner queue, then the job must have been hard-killedThe text was updated successfully, but these errors were encountered: