You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when restart a training, the lcurve.out file will be overwritten. Maybe it's more natural to keep adding to the existing file, so that we can keep the entire training history in a single file.
Detailed Description
When a training crashes, and I would like to restart it from the latest ckpt, I have to move the ckpt and input file into a separate folder to prevent log files from being overwritten. This creates more complexity for downstream workflows such as training history visualization. If there is no technical issue, I believe adding to the existing log file is a more desirable behavior than overwriting it.
Further Information, Files, and Links
No response
The text was updated successfully, but these errors were encountered:
Interesting, this happens to me for both single-task and multi-task. Maybe it's something associated with the cloud server, Ali-Pai, I will look into that.
Summary
Currently, when restart a training, the
lcurve.out
file will be overwritten. Maybe it's more natural to keep adding to the existing file, so that we can keep the entire training history in a single file.Detailed Description
When a training crashes, and I would like to restart it from the latest ckpt, I have to move the ckpt and input file into a separate folder to prevent log files from being overwritten. This creates more complexity for downstream workflows such as training history visualization. If there is no technical issue, I believe adding to the existing log file is a more desirable behavior than overwriting it.
Further Information, Files, and Links
No response
The text was updated successfully, but these errors were encountered: