Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow [[core.data.variable]] to be split across files #710

Merged
merged 4 commits into from
Jan 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions docs/source/using_the_ve/data/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,10 +210,9 @@ file="'../../data/xy_dim.nc'"
var_name="temp"
```

**NOTE**: At the moment,
`core.data.variable` tags cannot be used across multiple toml config files without
causing `ConfigurationError: Duplicated entries in config files: core.data.variable` to
be raised. This means that all variables need to be combined in one `config` file.
You can include `core.data.variable` tags in different files. This can be useful to
group model-specific data with other model configuration options, and allow
configuration files to be swapped in a more modular fashion.

To load configuration data , you will typically use the `cfg_paths` argument
to pass one or more TOML formatted configuration files to create a
Expand Down
23 changes: 23 additions & 0 deletions tests/core/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,29 @@
("b.bb.bbb.bbba.bbbaa",),
id="conflict_complex",
),
pytest.param(
{"d1": {"d2": [1, 2, 3]}},
{"d1": {"d2": [4, 5]}},
{"d1": {"d2": [1, 2, 3, 4, 5]}},
(),
id="no_conflict_list_merge",
),
# The next example passes just fine, which is intentional, but the test is here
# to highlight the behaviour
pytest.param(
{"d1": {"d2": [1, 2, 3]}},
{"d1": {"d2": [{"file": "a_path"}]}},
{"d1": {"d2": [1, 2, 3, {"file": "a_path"}]}},
(),
id="no_conflict_list_merge_dubious_content",
),
pytest.param(
{"d1": {"d2": [1, 2, 3]}},
{"d1": {"d2": "a"}},
{"d1": {"d2": "a"}},
("d1.d2",),
id="conflict_list_and_not_list",
),
],
)
def test_config_merge(dest, source, exp_result, exp_conflicts):
Expand Down
23 changes: 19 additions & 4 deletions virtual_ecosystem/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,22 @@ def config_merge(
"""Recursively merge two dictionaries detecting duplicated key definitions.

This function returns a copy of the input ``dest`` dictionary that has been extended
recursively with the entries from the input ``source`` dictionary. The two input
dictionaries must not share any key paths and when duplicated key paths are
found, the value from the source dictionary is used and the function extends the
returned ``conflicts`` tuple with the duplicated key path.
recursively with the entries from the input ``source`` dictionary.

In general, if two input dictionaries share complete key paths (that is a set of
nested dictionary keys leading to a value) then that indicates a duplicated setting.
The values might be identical, but the configuration files should not duplicate
settings. When duplicated key paths are found, the value from the source dictionary
is used and the function extends the returned ``conflicts`` tuple with the
duplicated key path.

However an exception is where both entries are lists - resulting from a TOML array
of tables (https://toml.io/en/v1.0.0#array-of-tables). In this case, it is
reasonable to append the source values to the destination values. The motivating
example here are `[[core.data.variable]]` entries, which can quite reasonably be
split across configuration sources. Note that no attempt is made to check that the
combined values are congruent - this is deferred to error handling when the
configuration is used.

Args:
dest: A dictionary to extend
Expand Down Expand Up @@ -66,6 +78,9 @@ def config_merge(
dest[src_key], conflicts = config_merge(
dest_val, src_val, conflicts=conflicts, path=next_path
)
elif isinstance(dest_val, list) and isinstance(src_val, list):
# Both values for this key are lists, so merge the lists
dest[src_key] = [*dest_val, *src_val]
elif dest_val is None:
# The key is not currently in dest, so add the key value pair
dest[src_key] = src_val
Expand Down
10 changes: 5 additions & 5 deletions virtual_ecosystem/core/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,11 @@
[[core.data.variable]]
var_name="elev"

Data configurations must not contain repeated data variable names. NOTE: At the moment,
```core.data.variable``` tags cannot be used across multiple toml config files without
causing ```ConfigurationError: Duplicated entries in config files: core.data.variable```
to be raised. This means that all variables need to be combined in one ```config```
file.
Data configurations must not contain repeated data variable names.

You can include ```core.data.variable``` tags in different files. This can be useful to
group model-specific data with other model configuration options, and allow
configuration files to be swapped in a more modular fashion.

.. code-block:: python

Expand Down