You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having worked with pangeo_forge recipes extensively over the past month, I am now considering some potential internal refactors to simplify the code base.
Current situation
Currently, most of the logic lives in the recipe module, while the patterns module has a few simple routines to generate filenames. There is a lot of implicit logic in the recipe classes about how files are organized. That's why we have separate classes for NetCDFtoZarrSequentialRecipe and NetCDFtoZarrMultiVarSequentialRecipe. I have come to feel that this is not a clear separation fo concerns
Proposal: move all logic about how files are organized in the the patterns module
Instead, we could imagine having a Pattern object represent everything about _how a particular set of files are organized. It would explain
What are the "keys" used to generate the filenames
How to format the filenames
How different keys are related. For example time might be a "concat" key, while variable might be a merge key. This would be similar to ncml
A recipe could then look at a Pattern and decide what to do. (It might decide it can't support that pattern and raise an error.) But then we would only need one XarrayZarrRecipe.
The text was updated successfully, but these errors were encountered:
As I mentioned in today's coordination meeting, from my perspective, this general direction feels more approachable than the existing design. In a way, these Pattern objects then become, in some sense, a form of configuration, which seems appropriate given their expected high degree of variability and the sheer number of them we anticipate seeing over time. All of this is, of course, the viewpoint of someone entirely new to the library; others more familiar with the internals may have very good reasons that it should not change.
Having worked with pangeo_forge recipes extensively over the past month, I am now considering some potential internal refactors to simplify the code base.
Current situation
Currently, most of the logic lives in the
recipe
module, while thepatterns
module has a few simple routines to generate filenames. There is a lot of implicit logic in the recipe classes about how files are organized. That's why we have separate classes for NetCDFtoZarrSequentialRecipe and NetCDFtoZarrMultiVarSequentialRecipe. I have come to feel that this is not a clear separation fo concernsProposal: move all logic about how files are organized in the the
patterns
moduleInstead, we could imagine having a
Pattern
object represent everything about _how a particular set of files are organized. It would explaintime
might be a "concat" key, whilevariable
might be a merge key. This would be similar to ncmlA recipe could then look at a
Pattern
and decide what to do. (It might decide it can't support that pattern and raise an error.) But then we would only need oneXarrayZarrRecipe
.The text was updated successfully, but these errors were encountered: