Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use restart md5 hashes for reproducibility tests #278

Open
aekiss opened this issue Feb 19, 2025 · 4 comments
Open

Use restart md5 hashes for reproducibility tests #278

aekiss opened this issue Feb 19, 2025 · 4 comments

Comments

@aekiss
Copy link
Contributor

aekiss commented Feb 19, 2025

Following from #266 (comment), I think comparing the md5 hashes in manifests/restart.yaml makes an ideal test for detecting reproducibility failures, because they are

  • sensitive and thorough: these md5 hashes will detect bitwise changes in model state arising from any part of any component
  • already generated by payu https://payu.readthedocs.io/en/latest/manifests.html#manifest-contents
  • low-maintenance: CI will not need to be updated to handle model config changes, because the restarts (by definition) cover all the prognostic variables for all components

However, there are some wrinkles:

  • Irrelevant restart metadata differences such as internally-stored timestamps would also alter the md5 hash, but there were no problems of this sort in ACCESS-OM2 non-reproducible runs access-om2#266 and I think this was also true of ACCESS-OM3 (not bitwise reproducible #40) - is that still the case? (note, we need to use md5 rather than binhash, as IIRC binhash includes the file modification date)

  • manifests/restart.yaml tracks the initial rather than final state, so we'd need to update the manifests after the final run before comparing them (does payu setup do that?)

  • While the restarts would be excellent for initial detection of repro failures, they are less useful for diagnosing their cause, due to a lack of granularity in time or component subroutines. So there's still a place for checking other component-specific files for diagnosis e.g. ocean.stats.

@aekiss
Copy link
Contributor Author

aekiss commented Feb 19, 2025

Related: #41

@anton-seaice
Copy link
Contributor

Also captured in ACCESS-NRI/model-config-tests#86

@aekiss
Copy link
Contributor Author

aekiss commented Feb 19, 2025

Ah thanks, I thought this had already been suggested somewhere!

@dougiesquire
Copy link
Collaborator

There's also some related discussion in this issue where some potential problems with using the md5 restart hashes are raised.

MOM6 conveniently includes checksums for each restart variable in the metadata of the restart files, so using these seems like the best of all worlds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants