Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to save experiment directory (EXPDIR) #2994

Closed
1 task
KateFriedman-NOAA opened this issue Oct 8, 2024 · 7 comments · Fixed by #3105
Closed
1 task

Ability to save experiment directory (EXPDIR) #2994

KateFriedman-NOAA opened this issue Oct 8, 2024 · 7 comments · Fixed by #3105
Assignees
Labels
feature New feature or request

Comments

@KateFriedman-NOAA
Copy link
Member

What new functionality do you need?

A switch or similar setting to allow users to save their EXPDIR (e.g. to HPSS).

What are the requirements for the new functionality?

That the contents of the EXPDIR are saved/archived.

Acceptance Criteria

  • EXPDIR saved/archived to HPSS

Suggest a solution (optional)

No response

@KateFriedman-NOAA KateFriedman-NOAA added the feature New feature or request label Oct 8, 2024
@DavidHuber-NOAA DavidHuber-NOAA self-assigned this Nov 4, 2024
@DavidHuber-NOAA
Copy link
Contributor

I'm wondering how often this should be saved since it isn't uncommon for the EXPDIR contents to be modified during an experiment. Should this be once per experiment, once per cycle, or perhaps on the first and last cycle?

@KateFriedman-NOAA
Copy link
Member Author

My thought is at least the first cycle so we can capture the configs at the start. The last cycle would also be good to do, especially to capture the final state of the db/xml and configs. In between that, maybe every 00z so we can capture config changes and save the db/xml at certain points in case they are needed? Definitely first and last though.

@DavidHuber-NOAA
Copy link
Contributor

Alright, sounds good. They won't be large tarballs (less than 1MB), so I will aim to store them every 00z during gdas or gefs archiving.

@KateFriedman-NOAA
Copy link
Member Author

Are you thinking to archive the entire EXPDIR as is or have a list of files to archive? I ask because sometimes users will have more in their EXPDIR than just the configs, db, xml, and other files that get generated by the workflow. For example, they may have code or a clone of something that they are using in the experiment. Do we want to include that too if it's in the EXPDIR?

@DavidHuber-NOAA
Copy link
Contributor

Hmm, good question. I had considered just the XML, database, configs, and possibly the logs. I could see the desire to add other things in there, but if they have a copy of the global workflow in the EXPDIR, that would be an extremely long-running htar command and I'm not sure how the symlinks would be handled. I think we should limit it to just a limited set of files.

That said, we could add a function to get the hashes and diffs of the HOMEgfs global workflow clone and all submodules then add that to a text file to be archived with the EXPDIR.

@KateFriedman-NOAA
Copy link
Member Author

I think we should limit it to just a limited set of files.

Fully agree.

That said, we could add a function to get the hashes and diffs of the HOMEgfs global workflow clone and all submodules then add that to a text file to be archived with the EXPDIR.

Oooooo I like that.! That would be very handy information to archive.

@AndrewEichmann-NOAA
Copy link
Contributor

Hashes and diffs would be wonderful, plus the EXPDIR as it exists (to pick up modifications).

DavidHuber-NOAA added a commit that referenced this issue Dec 9, 2024
…3105)

# Description
This adds the capability to archive the experiment directory.
Additionally, this adds options to run `git status` and `git diff` on
the `HOMEgfs` global workflow (but not the submodules) and store that
information within the experiment directory's archive. These options are
specified in `config.base` with the following defaults:

```bash
export ARCH_EXPDIR='YES'     # Archive the EXPDIR configs, XML, and database
export ARCH_EXPDIR_FREQ=0    # How often to archive the EXPDIR in hours or 0 for first and last cycle only
export ARCH_HASHES='YES'     # Archive the hashes of the GW and submodules and 'git status' for each; requires ARCH_EXPDIR
export ARCH_DIFFS='NO'       # Archive the output of 'git diff' for the GW; requires ARCH_EXPDIR
```

Resolves #2994
# Type of change
- [x] New feature (adds functionality)

# Change characteristics
<!-- Choose YES or NO from each of the following and delete the other
-->
- Is this a breaking change (a change in existing functionality)? NO
- Does this change require a documentation update? YES
- Does this change require an update to any of the following submodules?
YES (If YES, please add a link to any PRs that are pending.)
  - [x] wxflow NOAA-EMC/wxflow#45

# How has this been tested?
- [x] Local archiving on Hercules for a C48_ATM case
- [x] Cycled testing on Hercules with `ARCH_DIFFS=YES` and
`ARCH_EXPDIR_FREQ=6,12`
- [x] Testing with `ARCH_EXPDIR=NO` or `ARCH_HASHES=NO`

# Checklist
- [x] Any dependent changes have been merged and published
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have documented my code, including function, input, and output
descriptions
- [x] My changes generate no new warnings
- [x] New and existing tests pass with my changes
- [x] This change is covered by an existing CI test or a new one has
been added
- [x] Any new scripts have been added to the .github/CODEOWNERS file
with owners
- [x] I have made corresponding changes to the system documentation if
necessary

---------

Co-authored-by: Walter Kolczynski - NOAA <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants