Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add toml support for ParamsDependency #4258

Merged
merged 2 commits into from
Jul 24, 2020

Conversation

dtrifiro
Copy link
Contributor

@dtrifiro dtrifiro commented Jul 22, 2020

We've been testing DVC with a few of our projects and for some of those we use a toml config to store parameters.
This PR adds support for parameters file in toml format. Example:

dvc run -n train                      \
-d data/input.csv                   \
-o model/model.h5                \
-M model/train_history.csv   \
-p params.toml:model.training \
python train.py

Where params.toml looks something like this:

[transformers.numerical_quantile_transformer]
n_quantiles = 300
output_distribution = "uniform"

[model.training]
epochs = 50
verbose = 1

config = yaml.safe_load(fobj)
except yaml.YAMLError as exc:
if self.path_info.suffix == ".yaml":
config = yaml.safe_load(fobj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use yaml to load json as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just try checking toml suffix, and then fallback to yaml.safe_load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just force-pushed a check for suffix (use yaml.safe_load() for json and yaml)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still this misses the yml. So, it's better to just:

if suffix == "toml":
    # toml load
else:
   # yaml load

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually safe for us to guess the file format from the suffix here? It seems like we'll potential to run into issues where users have params filenames with unexpected capitalization on case-sensitive filesystems, or where they are using .yml instead of .yaml.

It seems to me we should maybe just try to load as yaml (and json) first, then try toml if that fails, then error out?

Copy link
Contributor Author

@dtrifiro dtrifiro Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a loader map which is based on the lowercase file suffix, which defaults to yaml.safe_load for unknown suffixes. I also updated _read_params

Copy link
Contributor

@efiop efiop Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with @pmrowla , it is better to just try to load and catch yaml formatting error than only trying to judge by the extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last force-push I ended up using another approach (default to yaml.safe_load, use toml if lowercase suffix matches) because I felt a nested try/catch a bit awkward, especially if there are other loaders different from yaml/json and toml (not that I think there's much more, these 3 alone probably cover most of the use cases).

What is your preference?

Also, I saw that the Travis build fails on Windows and Mac (probably due to misconfiguration?) is there anything I need to do about that? Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop, In case of unknown file format, sure, it could fallback to yaml; but, if the file is clearly a toml file, it should just try to load as toml and fail.

This is what @dtrifiro is doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let's keep it. 👍

@dtrifiro dtrifiro force-pushed the toml-params-support branch from 06fd02a to 2900649 Compare July 22, 2020 08:23
@dtrifiro dtrifiro force-pushed the toml-params-support branch from 2900649 to edebcb4 Compare July 22, 2020 11:37
res[str(config)] = yaml.safe_load(fobj)
except yaml.YAMLError:
res[str(config)] = ParamsDependency.PARAMS_FILE_LOADERS[
config.suffix.lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@shcheklein
Copy link
Member

Please, don't forget to update the docs after this is merged.

@codecov
Copy link

codecov bot commented Jul 22, 2020

Codecov Report

Merging #4258 into master will decrease coverage by 0.47%.
The diff coverage is 95.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4258      +/-   ##
==========================================
- Coverage   91.77%   91.30%   -0.48%     
==========================================
  Files         173      173              
  Lines       11743    11772      +29     
==========================================
- Hits        10777    10748      -29     
- Misses        966     1024      +58     
Impacted Files Coverage Δ
dvc/external_repo.py 86.63% <94.44%> (+0.03%) ⬆️
dvc/scm/git.py 88.12% <100.00%> (+0.27%) ⬆️
dvc/system.py 62.39% <0.00%> (-26.50%) ⬇️
dvc/utils/fs.py 88.88% <0.00%> (-3.97%) ⬇️
dvc/analytics.py 92.59% <0.00%> (-3.71%) ⬇️
dvc/stage/run.py 94.02% <0.00%> (-2.99%) ⬇️
dvc/tree/ssh/connection.py 79.80% <0.00%> (-2.41%) ⬇️
dvc/daemon.py 54.38% <0.00%> (-1.76%) ⬇️
dvc/tree/local.py 93.86% <0.00%> (-0.95%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b77ce02...edebcb4. Read the comment docs.

@efiop efiop requested a review from skshetry July 24, 2020 02:21
@efiop
Copy link
Contributor

efiop commented Jul 24, 2020

@dtrifiro Could you please submit a docs PR for https://dvc.org/doc/command-reference/run ?

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. It'd be great if you could move these logics to somewhere else, like dvc/repo/params/utils.py as load_params(path_info), but I'm fine with it for now though.

@dtrifiro
Copy link
Contributor Author

dtrifiro commented Jul 24, 2020

@dtrifiro Could you please submit a docs PR for https://dvc.org/doc/command-reference/run ?

@efiop
iterative/dvc.org#1612

@efiop
Copy link
Contributor

efiop commented Jul 24, 2020

@dtrifiro Thank you so much! 🙏

@efiop efiop merged commit e2b1fb1 into iterative:master Jul 24, 2020
@dtrifiro dtrifiro deleted the toml-params-support branch July 27, 2020 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants