Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update config file path resolution #716

Merged

Conversation

davidorme
Copy link
Collaborator

@davidorme davidorme commented Jan 31, 2025

Description

This PR:

  • Updates core.config._resolve_config_paths to recursively search a config file payload for values stored under keys ending in _path and then resolves those values relative to the config path. The function previously only did this to file entries in [core.data.variable] tables and recent changes in model configurations now need options to resolve other file paths within the config tree.

  • Extends the test suite to check path manipulation and error handling work as expected

  • Updates the core model schema to change file to file_path and then rolls out the change in the [core.data.variable] schema specs across example files and tests to get this to apply. This is a bit of a sweeping change but I think it is long term easier to maintain the pattern "xyz_path needs resolving" than to have to maintain a list that "these specific key paths need resolving".

Fixes # 715

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works
  • Relevant documentation reviewed and updated

@davidorme davidorme linked an issue Jan 31, 2025 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2025

Codecov Report

Attention: Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.

Project coverage is 94.69%. Comparing base (4cab25c) to head (d89e8aa).
Report is 4 commits behind head on develop.

Files with missing lines Patch % Lines
virtual_ecosystem/core/config.py 82.35% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #716      +/-   ##
===========================================
- Coverage    94.72%   94.69%   -0.04%     
===========================================
  Files           73       73              
  Lines         4932     4937       +5     
===========================================
+ Hits          4672     4675       +3     
- Misses         260      262       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@davidorme davidorme marked this pull request as ready for review January 31, 2025 21:27
Copy link
Collaborator

@jacobcook1995 jacobcook1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just had a minor documentation suggestion

One last detail of model configurations is that you may want to allow users to specify a
file path to configure your model. This should not be used for loading the data used to
run the model but might be used to load further details. If you do want to include a
file path in your congfiguration, you need to pick a configuration key name that ends in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth giving an example here, as it's pretty technical language which I think becomes much more obvious when you give an example of what a non-data data file might contain

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I have to admit that without the explanation in the PR description, I would not know what this is suppose to be doing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalonsoa Yup - I've updated both this section and the _resolve_config_paths docstring. Is that now clearer?

Copy link
Collaborator

@jacobcook1995 jacobcook1995 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the part I find difficult is "specify a file path to configure your model", in that I think you can only understand this if you already have a pretty good idea of how the model configuration system works. I guess something expanded on what this means/why you would want to configure the models with additional non-data files would be useful

Copy link
Collaborator Author

@davidorme davidorme Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh FFS. I hadn't pushed my update to that section. I've pushed it and added a concrete example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah the new text is much clearer!

Copy link
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks OK, but I think the explanation on what this is for and how to use it needs some improvement. As it is now, it is very much unclear.

I've also flagged one potential issue, although that might be better addressed in a separate PR.

One last detail of model configurations is that you may want to allow users to specify a
file path to configure your model. This should not be used for loading the data used to
run the model but might be used to load further details. If you do want to include a
file path in your congfiguration, you need to pick a configuration key name that ends in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I have to admit that without the explanation in the PR description, I would not know what this is suppose to be doing.

@@ -97,36 +97,34 @@ def config_merge(
return dest, conflicts


def _resolve_config_paths(config_dir: Path, params: dict[str, Any]) -> None:
def _resolve_config_paths(config_dir: Path, config_dict: dict[str, Any]) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't checked this back in the day, so I mention it here now. This will work totally find in Mac and Linux (bugs aside) but will fail in Windows if any of the paths is in a different unit than the configuration file. Let's say your config file is in C: but you have all you data files in a separate drive, eg. D:. In this case, there will be an error as it is not possible to stablish relative paths between locations in different units. It's a very specific case, but I mention it because I learn it the hard way!

Copy link
Collaborator Author

@davidorme davidorme Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalonsoa Just to check I understand this. If we have a config file on Windows, then the file could be C:/config.toml but contain references to D:/data/data1.nc. Within the single config file, does that data path have to be absolute, because there is no way in Windows to get a relative path onto the D:/ drive from C:/?

If that is the case, then we just need to make sure those D:/ references don't get tampered with. I think this is another PR.

Copy link
Collaborator

@dalonsoa dalonsoa Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. In Windows, you cannot stablish relative paths between locations in different units. It just does not let you. At least, pathlib cannot resolve that situation. This is what I get when I tested it a few minutes ago:

>>> from pathlib import Path
>>> a = Path.cwd()
>>> a
WindowsPath('C:/Users/dalonsoa')
>>> b = Path(r"D:/agdp.exe")
>>> b
WindowsPath('D:/agdp.exe')
>>> b.exists()
True
>>> b.relative_to(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\dalonsoa\.pyenv\pyenv-win\versions\3.12.4\Lib\pathlib.py", line 682, in relative_to
    raise ValueError(f"{str(self)!r} is not in the subpath of {str(other)!r}")
ValueError: 'D:\\agdp.exe' is not in the subpath of 'C:\\Users\\dalonsoa'
>>> a.relative_to(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\dalonsoa\.pyenv\pyenv-win\versions\3.12.4\Lib\pathlib.py", line 682, in relative_to
    raise ValueError(f"{str(self)!r} is not in the subpath of {str(other)!r}")
ValueError: 'C:\\Users\\dalonsoa' is not in the subpath of 'D:\\agdp.exe'

Copy link
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer now!

virtual_ecosystem/core/config.py Outdated Show resolved Hide resolved
@davidorme davidorme enabled auto-merge February 4, 2025 11:27
@davidorme davidorme merged commit e5fe458 into develop Feb 4, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend path resolution in configuration files
4 participants