Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMA] Add objects and rules for sessions and scans files #985

Closed
wants to merge 9 commits into from

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Jan 22, 2022

Closes #978. This continues from Thursday's schema conversation.

Changes proposed:

  • Add object definitions for the scans and sessions files to objects/suffixes.yaml.
  • Create a new rules file for "tabular metadata files", such as participants.tsv, scans.tsv, and sessions.tsv files. The file is currently called tabular_metadata.yaml.
    • The proposed structure formalizes the assumption that presence or absence of allowed entities also determines where they should be located in the dataset (e.g., if a file requires the sub entity, then it MUST go in the subject subfolder).
  • Remove cases of required: false from rules/top_level_files.yaml, given that the default should be false moving forward.
  • Move object definitions in objects/top_level_files.yaml to objects/suffixes.yaml. These filenames are essentially suffixes without entities, even though some of them have underscores in the names (e.g., dataset_description), which are normally prohibited for suffixes.
    • I just think this organization is a little cleaner, overall.

To do:

  • Discuss the proposed changes.
  • Update the rendering code to work with the new rules.
  • Add a test or two for the new rules.

@tsalo tsalo added discussion ongoing discussion schema Issues related to the YAML schema representation of the specification. Patch version release. labels Jan 22, 2022
@TheChymera
Copy link
Collaborator

TheChymera commented Jan 23, 2022

@tsalo
@sappelhoff

I was also working on this and trying to make it work with our validator.
I created a draft PR (including the suffix description from here): #987

If the suggestion was to treat these files similarly to datatypes and not create a separate YAML file type for them, perhaps the YAML documenting them can live inside the datatype directory? Might also be good as an intermediary step, i.e. those are files which behave the same in as far as for the time being others still do not.

The only issue which I see with that, is that the datatype file name would be assumed as an intermediary subdirectory, i.e. automatic path constructors as used by validators or by render.py would create /sub-<sub>/tabular_metadata/sub-<sub>_sessions.tsv instead of /sub-<sub>/sub-<sub>_sessions.tsv. This can of course easily be solved with a hard-coded if statement (which would be a counter-argument for keeping them in the same place), or maybe the datatype can be queried against rules/modalities.yaml, so as to only create a subdirectory if the datatype is a modality.

@TheChymera
Copy link
Collaborator

TheChymera commented Jan 26, 2022

In as far as anybody finds this relevant at this point, this is how we could ~seamlessly integrate directory addition with all the other datatypes:

https://github.com/TheChymera/bids-specification/blob/e92d094a20135992cc4f9d8c7ee27e0c94bc1316/tools/schemacode/schemacode/validator.py#L81-L103

This is addressing the former point I raised about datatypes which require a subdirectory, assuming all modality datatypes require a subdirectory: https://github.com/TheChymera/bids-specification/blob/e92d094a20135992cc4f9d8c7ee27e0c94bc1316/tools/schemacode/schemacode/validator.py#L216-L222

I've tried to base this code closely off of render.py and maybe we could use similar functions for both (or the same, if there's an easy way to output either regex or human-readable styling)... in any case this approach is usable with the logic seen here:

def make_filename_template(schema, n_dupes_to_combine=6, **kwargs):

@tsalo
Copy link
Member Author

tsalo commented Jan 28, 2022

If the suggestion was to treat these files similarly to datatypes and not create a separate YAML file type for them, perhaps the YAML documenting them can live inside the datatype directory? Might also be good as an intermediary step, i.e. those are files which behave the same in as far as for the time being others still do not.

I think I'd prefer to avoid having them in the datatypes folder and to instead pursue adding a datatype field, as proposed by (I believe) @effigies. Then, these files would either have datatype: None or just nothing, with the default being that datatype is None.

In as far as anybody finds this relevant at this point, this is how we could ~seamlessly integrate directory addition with all the other datatypes:

https://github.com/TheChymera/bids-specification/blob/e92d094a20135992cc4f9d8c7ee27e0c94bc1316/tools/schemacode/schemacode/validator.py#L81-L103

That's awesome! I haven't had a chance to take a close look at either of your code snippets, but I will try to soon.

@TheChymera
Copy link
Collaborator

I'd prefer to avoid having them in the datatypes folder and to instead pursue adding a datatype field

@tsalo so you'd like to flatten the datatype file hierarchy as well and have everything live under rules/ ?

@TheChymera
Copy link
Collaborator

I've tried to base this code closely off of render.pyand maybe we could use similar functions for both

Also, do you have any input on this? I think it would be cool if we could have a minimal validator which draws upon the BIDS documentation functions and is thereby in sync with it by design.

@codecov
Copy link

codecov bot commented Jan 31, 2022

Codecov Report

Merging #985 (859cafc) into master (51e9f1b) will increase coverage by 1.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #985      +/-   ##
==========================================
+ Coverage   34.05%   35.13%   +1.07%     
==========================================
  Files           8        8              
  Lines         834      834              
==========================================
+ Hits          284      293       +9     
+ Misses        550      541       -9     
Impacted Files Coverage Δ
tools/schemacode/schemacode/_version.py 38.90% <0.00%> (+2.59%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e53a367...859cafc. Read the comment docs.

@tsalo
Copy link
Member Author

tsalo commented Jan 31, 2022

@tsalo so you'd like to flatten the datatype file hierarchy as well and have everything live under rules/ ?

@TheChymera I meant more that the subfolders would still exist, but for convenience, rather than as an actual source of information.

Also, do you have any input on this? I think it would be cool if we could have a minimal validator which draws upon the BIDS documentation functions and is thereby in sync with it by design.

That sounds like a good idea, and the code itself looks solid. I think the actual discussion of the minimal validator should probably be a part of the schema meeting, but using similar patterns for rendering and regex construction seems like a good plan.

@tsalo
Copy link
Member Author

tsalo commented Feb 17, 2022

Closing this in favor of #1006.

@tsalo tsalo closed this Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion ongoing discussion schema Issues related to the YAML schema representation of the specification. Patch version release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sessions.tsv representation in YAML
3 participants