SCHEMA: Proposals to standardize schema/objects/*.yaml formats #1088

rwblair · 2022-04-21T20:15:20Z

From conversations in #1086 the key values and name fields are not used entirely consistently in the schema objects.
columns.yaml uses the name field value as the actual header column name we look for in tsv headers.
entities.yaml uses the entity field value as the literal entity that we look for in file names
suffixes.yaml uses the entries key as the literal suffix that we look for in file names.

Except for columns.yaml name is used as pretty printable/user friendly display name for the object.

The key used for an object is always how we refer to an object within the specification.

suffix and columns should have the values that we look for in datasets moved from their respective places into a new field.

Proposal 1 is to follow entity suite and have a special key-name unique to that object type. So from columns.yaml this:

acq_time__scans:
  name: acq_time
  description: |
    Acquisition time refers to when the first data point in each run was acquired.

Becomes:

acq_time__scans:
  name: Acquisition Time
  column: acq_time
  description: |
    Acquisition time refers to when the first data point in each run was acquired.

Proposal 2 would be to standardize on a single key name across object to represent in dataset values. If we followed columns suite and used name for this, pretty print values could shift to display_name:

# suffix
TwoPE:  # some unambiguous reference that works for MATLAB and ancpBIDS
  name: 2PE  # The actual value used in the specification
  display_name: 2-photon excitation microscopy  # The pretty name
  description: |
    2-photon excitation microscopy imaging data

# entity
inversion:  # also used for variable names in pybids
  name: inv
  display_name: Inversion Time
  description: |
    If files belonging to an entity-linked file collection are acquired at different
    inversion times, the `_inv-<index>` key/value pair MUST be used to distinguish
    individual files.
    This entity represents the `"InversionTime` metadata field. Please note that the `<index>`
    denotes the number/index (in the form of a nonnegative integer), not the `"InversionTime"`
    value which needs to be stored in the field `"InversionTime"` of the separate JSON file.
  type: string
  format: index

# column
type__electrodes:
  name: type
  display_name: Electrode Type
  description: |
    Type of the electrode (for example, cup, ring, clip-on, wire, needle).
  type: string

Instead of name there may be some other more appropriate key name to use, some floated during the 21.04.2022 schema meeting were alias, value, shorthand, etc but none felt quite right.

I would like to solicit input on the proposals and other potential solutions.

The text was updated successfully, but these errors were encountered:

rwblair · 2022-04-21T20:17:45Z

@bids-standard/schema-users

tsalo · 2022-04-21T20:22:57Z

Also want to throw in an extension example:

objects:

any:
  name: ".*"
  display_name: Any Extension
  description: |
    Any extension is allowed

none:
  name: ""
  display_name: No Extension
  description: |
    A file with no extension.

rules/datatypes/:

- suffixes:
  - headshape
  extensions:
  - any
  - none
  entities:
    subject: required
    session: optional
    acquisition: optional

effigies · 2022-04-21T20:30:30Z

I think there was one additional proposal, which is to distinguish datatype, suffix and extension as special kinds, having a "value" rather than a name:

# Suffix
TwoPE:  # some unambiguous reference that works for MATLAB and ancpBIDS
  value: 2PE  # The actual value used in the specification
  display_name: 2-photon excitation microscopy  # The pretty name
  description: |
    2-photon excitation microscopy imaging data

# Datatype
anatomical:
  value: anat
  display_name: Anatomical Magnetic Resonance Imaging
  description: |
    Magnetic resonance imaging sequences designed to characterize static, anatomical features.

# Extension
bdf:
  value: ".bdf"
  display_name: Biosemi Data Format
  description: |
    A [Biosemi](https://www.biosemi.com/) Data Format file.

    Each recording consists of a single `.bdf` file.
    [`bdf+`](https://www.teuniz.net/edfbrowser/bdfplus%20format%20description.html) files are permitted.
    The capital `.BDF` extension MUST NOT be used.

Meanwhile entities, columns and metadata could follow the convention 2:

# entity
inversion:  # also used for variable names in pybids
  name: inv
  display_name: Inversion Time
  description: |
    If files belonging to an entity-linked file collection are acquired at different
    inversion times, the `_inv-<index>` key/value pair MUST be used to distinguish
    individual files.
    This entity represents the `"InversionTime` metadata field. Please note that the `<index>`
    denotes the number/index (in the form of a nonnegative integer), not the `"InversionTime"`
    value which needs to be stored in the field `"InversionTime"` of the separate JSON file.
  type: string
  format: index

# column
type__electrodes:
  name: type
  display_name: Electrode Type
  description: |
    Type of the electrode (for example, cup, ring, clip-on, wire, needle).
  type: string

# Metadata
AcquisitionDuration:
  name: AcquisitionDuration
  display_name: Acquisition Duration
  description: |
    Duration (in seconds) of volume acquisition.
    Corresponds to DICOM Tag 0018, 9073 `Acquisition Duration`.
    This field is mutually exclusive with `"RepetitionTime"`.
  type: number
  exclusiveMinimum: 0
  unit: s

That said, metadata is also a bit of its own thing as well, since they are all valid identifiers and we never reference them except as literals in the spec, "<key>" == <key>.name == <key>.display_name. So maybe we should not try to uniformize it in this process. Edit: After discussion with @tsalo, there could be times when we want a unique metadata key, and a display name might be useful for other renderers than the spec.

TheChymera · 2022-04-21T20:51:48Z

This looks good, and making sure everything in rules has an object definition reduces the potential for confusion — at the cost of requiring more lookup to contribute (i.e. if you want to add a new extension you now need to edit two files, at least). However I'm not sure whether for the special case of extension it really makes sense. Do we really need to ship documentation for what .json means? We don't define any of these formats here, we don't want to make them diverge from their base standard, and we don't want to have to keep track here if things in the standard change.

The tricky parts about the extension were None, /, and .*. None simply means false (which is a boolean value which programming languages automatically understand), / and its combination with other extensions (like .ngff/) are not really made easier by this, and .* will still need to be regexified explicitly.

Not opposed to this necessarily, just wanted to make sure I'm not missing anything or that we're not adding too much unnecessary lookup steps to the schema....

erdalkaraca · 2022-04-22T05:13:23Z

I would go for proposal 2: the term 'column' may lead to confusion without any further context.

From conversations in #1086 the key values and name fields are not used entirely consistently in the schema objects. columns.yaml uses the name field value as the actual header column name we look for in tsv headers. entities.yaml uses the entity field value as the literal entity that we look for in file names suffixes.yaml uses the entries key as the literal suffix that we look for in file names.

Except for columns.yaml name is used as pretty printable/user friendly display name for the object.

The key used for an object is always how we refer to an object within the specification.

suffix and columns should have the values that we look for in datasets moved from their respective places into a new field.

Proposal 1 is to follow entity suite and have a special key-name unique to that object type. So from columns.yaml this:
acq_time__scans:
  name: acq_time
  description: |
    Acquisition time refers to when the first data point in each run was acquired.
Becomes:
acq_time__scans:
  name: Acquisition Time
  column: acq_time
  description: |
    Acquisition time refers to when the first data point in each run was acquired.
Proposal 2 would be to standardize on a single key name across object to represent in dataset values. If we followed columns suite and used name for this, pretty print values could shift to display_name:
# suffix
TwoPE:  # some unambiguous reference that works for MATLAB and ancpBIDS
  name: 2PE  # The actual value used in the specification
  display_name: 2-photon excitation microscopy  # The pretty name
  description: |
    2-photon excitation microscopy imaging data

# entity
inversion:  # also used for variable names in pybids
  name: inv
  display_name: Inversion Time
  description: |
    If files belonging to an entity-linked file collection are acquired at different
    inversion times, the `_inv-<index>` key/value pair MUST be used to distinguish
    individual files.
    This entity represents the `"InversionTime` metadata field. Please note that the `<index>`
    denotes the number/index (in the form of a nonnegative integer), not the `"InversionTime"`
    value which needs to be stored in the field `"InversionTime"` of the separate JSON file.
  type: string
  format: index

# column
type__electrodes:
  name: type
  display_name: Electrode Type
  description: |
    Type of the electrode (for example, cup, ring, clip-on, wire, needle).
  type: string
Instead of name there may be some other more appropriate key name to use, some floated during the 21.04.2022 schema meeting were alias, value, shorthand, etc but none felt quite right.

I would like to solicit input on the proposals and other potential solutions.

effigies · 2022-05-12T19:52:42Z

My two-type proposal (#1088 (comment)) was accepted at today's meeting. A big PR that will be Schema-version-bumping will be needed soon.

soichih · 2022-05-12T20:42:06Z

Will the update be made to 1.7.1? Do you know when 1.7.1 gets released?

effigies · 2022-05-12T22:38:26Z

I expect to do it before the next release, since it's a pretty constrained task, but @sappelhoff or @franklin-feingold would know better when to expect another release. I've been out of the loop for regular spec maintenance in the last month or so.

franklin-feingold · 2022-05-12T23:27:41Z

I don't have a release in mind until another BEP is incorporated into the specification - unless folks feel a patch bump is needed before then

rwblair added opinions wanted Please read and offer your opinion on this matter schema-structure Changes to the fundamental organization/structure of the YAML schema. Minor version release. labels Apr 21, 2022

rwblair changed the title ~~Proposals to standardize schema/objects/*.yaml formats~~ SCHEMA: Proposals to standardize schema/objects/*.yaml formats Apr 21, 2022

rwblair mentioned this issue May 12, 2022

WIP add entities to context and validate them. bids-standard/legacy-validator#1456

Merged

effigies mentioned this issue May 26, 2022

[SCHEMA] Unify metadata #1107

Merged

yarikoptic mentioned this issue Jun 7, 2022

Roadmap for the next release (1.7.1 or 1.8.0?) #1118

Closed

effigies closed this as completed in #1107 Jul 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCHEMA: Proposals to standardize schema/objects/*.yaml formats #1088

SCHEMA: Proposals to standardize schema/objects/*.yaml formats #1088

rwblair commented Apr 21, 2022 •

edited by effigies

Loading

rwblair commented Apr 21, 2022

tsalo commented Apr 21, 2022 •

edited

Loading

effigies commented Apr 21, 2022 •

edited

Loading

TheChymera commented Apr 21, 2022

erdalkaraca commented Apr 22, 2022

effigies commented May 12, 2022

soichih commented May 12, 2022

effigies commented May 12, 2022

franklin-feingold commented May 12, 2022

SCHEMA: Proposals to standardize schema/objects/*.yaml formats #1088

SCHEMA: Proposals to standardize schema/objects/*.yaml formats #1088

Comments

rwblair commented Apr 21, 2022 • edited by effigies Loading

rwblair commented Apr 21, 2022

tsalo commented Apr 21, 2022 • edited Loading

effigies commented Apr 21, 2022 • edited Loading

TheChymera commented Apr 21, 2022

erdalkaraca commented Apr 22, 2022

effigies commented May 12, 2022

soichih commented May 12, 2022

effigies commented May 12, 2022

franklin-feingold commented May 12, 2022

rwblair commented Apr 21, 2022 •

edited by effigies

Loading

tsalo commented Apr 21, 2022 •

edited

Loading

effigies commented Apr 21, 2022 •

edited

Loading