-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File housekeeping utility. #1159
Comments
UPDATE: a-ha, It looks like these built-in apps do not handle files "older than" (as opposed to "at") the cycle point offset , but that doesn't really matter. In my old utility, I was trying to automatically handle the case of changing to a smaller offset mid run. That's more difficult than matching a single specific cycle point, obviously (it requires a regex that matches any cycle point, which now depends on the format in use, or else it has to match all possible formats). |
Yes,
I think most of
It is less clear to me whether we should move |
In the latest version of |
A quick brain dump... It should be relatively straightforward to move job logs housekeep
In cylc, we can also do:
|
We should also have the capability to be able to housekeep the contents of the databases as they can become overly large over time. |
@arjclark - yes, DB housekeeping would be good. Maybe just deleting entries beyond some configurable cutoff would do. |
DB tables we can housekeep:
DB tables we cannot housekeep:
DB tables we may be able to housekeep:
See also #1827. |
[meeting] we agreed:
(need to be careful of any clash between DB and file housekeeping offsets) |
NIWA operations reports that (at older cylc versions) db locking issues were strongly correlated with the size of the suite db (presumably because read times became significantly longer, perhaps on a slow filesystem). They used to wipe a db and restart the suite from scratch occasionally, which would fix the problem. This isn't an issue now with our robust lock recovery mechanism, but if db ops do (or can) slow significantly with db size, then automatic housekeeping would be a good thing. |
@benfitzpatrick - the above comment looks related to your rose bush timings investigations |
We think we can find at least a factor of 2 speed-up for jobs and cycle views in Rose Bush, which I assume is always the dominant reader of the public database. Bigger databases are slower... |
Had a quick discussion with @dpmatthews. A lot of disk usages come from large job log files and large number of job logs per task submit. It may be worthwhile to have them house-kept more aggressively. E.g.:
|
Note if it is hard to keep |
Cylc really needs a built-in file housekeeping utility, for archiving (by copy or move) and deletion of date-time labeled files and directories older than some offset from current cycle point
The old
cylc housekeeping
command was removed at cylc-6 because it wasn't ISO 8601 compatible, and it had a serious deficiency that I had never got around to addressing: it was unable to match individual files below a date-time labeled directory. Aside from that it was quite nice in some respects: it was controlled by simple config files, and it performed its configured operations in parallel.For cylc-6+ a general housekeeping utility can no longer assume a simple fixed format cycle time (see #1158). It would have to be aware of the suite's cycle point format (actually it's worse than this - a suite using cycle point format CCYY-MM-DDTHH could still choose to use filenames containing CCYYMMDDHH for compatibility with external systems, for example).
At NIWA we currently use a (very non-general) in-house shell script for housekeeping. @matthewrmshin - how is this handled at the Met Office?
The text was updated successfully, but these errors were encountered: