Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify FM over all grids #6981

Conversation

tcclevenger
Copy link
Contributor

Store a single FieldManager for all fields over all grids.

This solves the issue in #6789 where having a group as a subset of another, and allocated on different grids, was causing subview indices mismatch between grids.

Also

  • Remove Derivation type. Now groups contain the union of all fields registered over all grids. I.e., requesting "tracers" on GLL grid without registering any tracers on GLL will give all tracers registered on PG2 grid.
  • Remove "Preferred" bundling. You either require bundling or not.

@tcclevenger tcclevenger marked this pull request as draft February 7, 2025 01:17
Copy link
Contributor Author

@tcclevenger tcclevenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments highlight the important pieces since this is such a large PR. So many of the changes come from every instance of the FM has to be used differently (passing grid names instead of only one grid being stored).

@tcclevenger
Copy link
Contributor Author

Quite a few fails in the CI, odd some didn't trigger on my workstation (rrtmgp standalone for instance)... Leaving Draft for now.

Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some preliminary thoughts.

@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/single_fm_multiple_grids branch from 952e6bf to cd557bc Compare February 13, 2025 15:52
@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/single_fm_multiple_grids branch 7 times, most recently from a997eed to 9e17dc8 Compare February 20, 2025 18:13
Also
- remove devivation type
- simplify group logic to ensure subfields over grids have same index
EKAT has an incorrect assert on the extents of subview slices, namely
that the end index must be less than the dim extent, when it must be
less than or equal to the last extent (since Kokkos uses exclusive end
bounds). Will require an EKAT PR and update.
The ensures that all groups are registered properly, even if they are
not bundled.
Bundle groups and non-bundled groups need different logic, namely,
bundled groups must register all fields on all grids, non-bundled only
consider fields which have been registered.
@rljacob rljacob added the EAMxx PRs focused on capabilities for EAMxx label Mar 4, 2025
Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I think there are a few things to fix.

@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/single_fm_multiple_grids branch 2 times, most recently from 638914b to e481223 Compare March 5, 2025 20:25
@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/single_fm_multiple_grids branch from e481223 to 2c54e67 Compare March 5, 2025 21:13
…ter lists

In the case of m_params.isParameter("Field Names"), require a single
grid, else ensure that we have the grid named correctly (w.r.t. aliases)
for the FM
@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/single_fm_multiple_grids branch from 5ad01e5 to 1f33c0f Compare March 5, 2025 22:16
@@ -45,7 +45,7 @@ class FieldTracking : public FamilyTracking<FieldTracking> {
const atm_proc_set_type& get_customers () const { return m_customers; }

// List of field groups that this field belongs to
const ekat::WeakPtrSet<const FieldGroupInfo>& get_groups_info () const { return m_groups; }
const ekat::WeakPtrSet<const FieldGroupInfo>& get_group_info () const { return m_groups; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method could have kept the original signature, since in this case we do return all of the groups. My comment was only for the FM method, which returns the info only for one group. Not important though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh sorry, just did a massive replace with sed.. Forgot this method existed, I'll change it back.

@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/single_fm_multiple_grids branch from 66f520a to b946a9c Compare March 6, 2025 15:22
@tcclevenger tcclevenger requested a review from bartgol March 6, 2025 16:15
@tcclevenger
Copy link
Contributor Author

@bartgol Had to correct the equivalent_layout() function in the tensor case, was caught in MPASSI case. Should be good to go.

template_layout.get_vector_component_idx() : -1;
const auto vec_dim = template_layout.is_vector_layout() ?
template_layout.get_vector_dim() : -1;
const auto tensor_dims = template_layout.is_tensor_layout() ?
Copy link
Contributor

@bartgol bartgol Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mahf708 brought this up in a private conversation. I think this is the kind of places where std::optional may be beneficial. RIght now, get_tensor_dims errors out if the layout is not that of a tensor. So before trying to grab the tens dims, we must first check if the layout is indeed that of a tensor. Instead, we could think of just returning std::optional<std::vector<Int>>, with the optional NOT storing a value for non-tensor layouts. Then, the calling code would have to handle the potential error:

auto tensor_dims = template_layout.get_tensor_dims();

Any code that access tensor_dims for non-tensor layouts (without doing some preliminary checks) would get a std::bad_optional_access exception thrown.

I'm not advocating for std::optional, but since Naser brought it up just yesterday in a private convo, I thought I mention it, to see what your opinion is. I think it could potentially make the calling code shorter, removing the responsibility for checks from the provider to the customer (in this case, from the FieldLayout methods, to AbstractGrid::equivalent_layouts). Since the calling function may already make sure that the return value should be fine (in this case, via the switch), it could be cleaner:

auto tensor_dims = template_layout.get_tensor_dims();
...
switch (template_layout.type()) {
...
  case LayoutType::Tensor2D:
        ret_layout = get_3d_tensor_layout(midpoints, *tensor_dims, tdims_names);

I haven't made up my mind yet, but maybe worth thinking about it.

Copy link
Contributor

@bartgol bartgol Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we could also just access the tens dims inside the switch cases that correspond to tensors. I think the reason I did not do that, is that whenever you decl variables local to the case, you need to add a scope, and it looks "odd" when you have a scope for one case and not for others (and it's a lot of extra lines to add the scope for all cases):

switch (a) {
  case A1:
     ...
  case A2:
  {
     auto local_var = ...;
  }
  case A3:
     ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't like the scope in the cases, and then for the vector and tensor quantities you need to repeat code (but the way it's done here we need the extra ? : line).

I don't have experience with std::optional but it sounds like it's meant for this situation. It does make the func signatures more complicated whereas returning an error if we aren't in a tensor case is probably easier to understand to a newer C++ coder. But I need to look more at std::optional to really understand what the changes would be.

I'll merge as is, but may consider it with the follow up I plan to do.

tcclevenger added a commit that referenced this pull request Mar 10, 2025
(PR #6981)

Store a single FieldManager for all fields over all grids.

This solves the issue in PR #6789 where having a group as a subset
of another, and allocated on different grids, was causing subview
indices mismatch between grids.

Also
  - Remove Derivation type. Now groups contain the union of all fields
    registered over all grids. I.e., requesting "tracers" on GLL grid
    without registering any tracers on GLL will give all tracers registered
    on PG2 grid.
  - Remove "Preferred" bundling. You either require bundling or not.
tcclevenger added a commit that referenced this pull request Mar 10, 2025
(PR #6981)

Store a single FieldManager for all fields over all grids.

This solves the issue in PR #6789 where having a group as a
subset of another, and allocated on different grids, was
causing subview indices mismatch between grids.

Also
  - Remove Derivation type. Now groups contain the union of
    all fields registered over all grids. I.e., requesting
    "tracers" on GLL grid without registering any tracers on
    GLL will give all tracers registered on PG2 grid.
  - Remove "Preferred" bundling. You either require bundling or not.

[BFB]
@tcclevenger tcclevenger merged commit 7c43a73 into E3SM-Project:master Mar 10, 2025
18 of 19 checks passed
@tcclevenger tcclevenger deleted the tcclevenger/eamxx/single_fm_multiple_grids branch March 10, 2025 13:19
tcclevenger added a commit that referenced this pull request Mar 11, 2025
…7109)

This was left after imported group tag were removed in #6981, but never
called and the GroupRequest constructor no longer exists. Interestingly,
only some compilers complain.

[BFB]
tcclevenger added a commit that referenced this pull request Mar 11, 2025
This was left after imported group tag were removed in #6981, but never 
called and the GroupRequest constructor no longer exists. Interestingly, 
only some compilers complain.

[BFB]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx PRs focused on capabilities for EAMxx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants