-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinvigorate remote_archive to support automatic backup of data from ephemeral scratch storage #200
Comments
This sounds good to me. If someone uses |
There is a function in As storage became less of a problem, this function became deprecated and is now basically a zombie function which cannot be called. But it would not be much work to reinstate something like this. Which is to say that we've been here before, albeit under different constraints (space vs time), but it should not be a problem to either append this function to the end of |
Ok. I think this is well overdue to be implemented. |
The COSIMA issue linked above refers to adding syncing of restarts to their existing sync script https://github.com/COSIMA/01deg_jra55_iaf/blob/master/sync_data.sh This script is called with the https://github.com/COSIMA/01deg_jra55_iaf/blob/master/config.yaml#L77 Not everything in that script is appropriate for including in Also important to note the Using |
I've created issue #358 and cross-referenced here. Syncing restarts that will late be deleted is an issue. It could be that the logic for restarts is different than for outputs: could use Another possibility is to have an option to not sync restarts, then tidy them in a separate step and then turn on restart syncing and then call something like
If It would probably require inspection of restart files (so retaining a dependency on the |
Just documenting some notes here: In the COSIMA issue above, it was also referenced that payu doesn't automatically collate the most recent restart. If rsync was set to exclude uncollated files, then the most recent restart wouldn't be synced. So One Otherwise, auto syncing restarts could be setup to only sync restarts using the integer restart frequency (or using date-based restart frequency). To automatically sync outputs, could run the sync command where the postscript hook is run, which is at the end of archive if not collating otherwise after collation. The sync config could something look like:
|
NCI is upgrading its HPC and at the same time changing replacing the
short
filesystem withscratch
, which is time limited.payu
is so well written (props @marshallward) that there is only one mention of the actual path/short
in the entire codebasepayu/payu/laboratory.py
Line 57 in f567db9
(we'll ignore hard coded paths in profiler modules)
So at the very minimum to support the new machine
default_short_path
should be changed toscratch
.However, as
scratch
is time limited it is no longer a good fit for the currentpayu
pattern, wherebin
,input
andcodebase
are stored in the samelaboratory
aswork
andarchive
.payu/payu/laboratory.py
Lines 45 to 49 in f567db9
With strict time limited deletion of files on
scratch
, the only directory that is a clear fit for this pattern iswork
. Thearchive
directory could live onscratch
, with some syncing to a permanent data store, but I thinkpayu
should also supportarchive
not being physically co-located with `work.Thoughts?
The text was updated successfully, but these errors were encountered: