-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional 'time_unlimited' logical flag to model_configure #1535
Add optional 'time_unlimited' logical flag to model_configure #1535
Conversation
@DusanJovic-NOAA Currently when we write tiled output, the netcdf time dimension is unlimited. By setting time_unlimited to false by default, will this mean tiled output does not have an unlimited time dimension? |
Tiled history files (each tile in a separate file) are created using esmf io, and are not affected by the option. |
@DusanJovic-NOAA can I check if you are working today? If so, we may combine this pr with #1538. |
Yes, I'm working today. Let me first sync my branches with develop |
Should I merge #1538 into this PR? |
Yes, go ahead to merge in #1538. Its PR template update. All no baseline change. It's good to combine. |
Merged #1538 |
on-behalf-of @ufs-community <[email protected]>
Automated RT Failure Notification |
Automated RT Failure Notification |
on-behalf-of @ufs-community <[email protected]>
@DusanJovic-NOAA On gaea, can you take a look at /lustre/f2/scratch/Jong.Kim/FV3_RT/rt_16491_troubleshoot? I think this is similar issue as cheyenne for control_c384 and control_c384gdas. |
I am not sure if we need @uturuncoglu 's opinion. We thought memory issue. But just in case we need think from esmf's point of view. @uturuncoglu We created #1549. we started seeing this issue from cheyenne but it seems to be happening on gaea as well. |
Let me try to run the tests with fewer tasks per node on gaea for these two tests. |
Regression test passed on Gaea using TPN=18 for control_c384 and control_c384gdas. |
cool! Let me finish up the RT log on gaea. I will try on cheyenne as well. |
I'm confused why the changes in this PR have any impact on the resource needs. The unlimited option is off by default, correct? So why are we seeing a resource impact? |
The changes in this PR are not the reason for the failures but probably the switching to esmf managed threading, few commits earlier. For some reason these two tests occasionally fail on gaea and/or cheyenne most probably due to memory limits. |
@DusanJovic-NOAA Thanks, that is also what I suspect. But I didn't think the esmf-managed threading changed any of the resource allocations, so that jobs running w/ 1 thread before were still running w/ one thread after etc. |
I am turning on control_c384 on cheyenne. It runs ok with TPN =18. |
Is see in PR #1523 in this commit (a941652) I changed the number of threads for c384 tests to 1 after @jkbk2004 reported failure on gaea (#1523 (comment)). That seemed to solve the issue then. But in this PR these two tests failed again. |
This feels like playing whack-a-mole. |
I will watch. odd seems to be reduced. BTW, I think the git yaml script to pick up git.run.id might be a bit outdated. I will test on my side. |
So, all tests set. we can start merging process. |
@DusanJovic-NOAA fv3 pr was merged. |
PR Checklist
This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.
This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR
An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
are specified below.
Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.
New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.
Instructions: All subsequent sections of text should be filled in as appropriate.
The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsibility to keep the PR up-to-date with the develop branch of ufs-weather-model.
Description
Add new model_configure parameter (logical flag) 'time_unlimited'. This flag is set to .false.by default. When the user sets it to .true. explicitly in the model_configure file the time dimension in fv3atm history files will be a record dimension (ie. unlimited)
Issue(s) addressed
Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)
Testing
How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)
Dependencies
If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).
Do PRs in upstream repositories need to be merged first? Yes.
If so add the "waiting for other repos" label and list the upstream PRs