Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe deprecate pattern_from_file_sequence #162

Open
TomAugspurger opened this issue Jun 24, 2021 · 3 comments
Open

Maybe deprecate pattern_from_file_sequence #162

TomAugspurger opened this issue Jun 24, 2021 · 3 comments

Comments

@TomAugspurger
Copy link
Contributor

See #160 (comment) for justification, but the tl/dr is that for large recipes (many inputs), filepatterns_from_sequence can make large function objects.

So while filepatterns_from_sequence is very convenient, I think we're best off guiding users to writing a function that generates the filepattern for an input index dynamically, rather than looking it up from a list. Otherwise recipe reviewers will need to be on the lookout for uses of it when there are a large number of inputs.

@cisaacstern cisaacstern changed the title Maybe deprecate filepatterns_from_sequence Maybe deprecate pattern_from_file_sequence Jun 29, 2021
@cisaacstern
Copy link
Member

(Changed issue name per pangeo-forge/staged-recipes#58 (comment).)

I do have a few open PRs which use pattern_from_file_sequence:

but I'm not especially attached to this method.

In fact, any convenience lost by removing this method is likely offset by the standardization benefit derived from making it so there is, in the words of PEP 20, "... one-- and preferably only one --obvious way to do it."

@cisaacstern
Copy link
Member

Since my last comment on this issue I have found many, many situations where pattern_from_file_sequence has been remarkably helpful, usually for recipes with on the order of 3 - 20 input files. I'd therefore like to amend my previous comment

but I'm not especially attached to this

to say that I am in favor of keeping pattern_from_file_sequence.

Otherwise recipe reviewers will need to be on the lookout for uses of it when there are a large number of inputs.

Maybe we can build some automated checking for this, to minimize toil for maintainers.

@rabernat
Copy link
Contributor

rabernat commented Sep 9, 2021

The only real problem with pattern_from_file_sequence is that it may contribute to the task-graph memory issues that we have struggled with in the past.

The way to resolve this would be to develop quantitative metrics and tests for those memory issues, rather than just waiting to hear that a particular recipe made the bakery crash. This was the point I was trying to make in #160 (comment). Without such metrics, we are flying blind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants