Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add check for case collision intollerance #49

Open
yarikoptic opened this issue Sep 13, 2021 · 8 comments
Open

add check for case collision intollerance #49

yarikoptic opened this issue Sep 13, 2021 · 8 comments
Labels
effort: medium Estimated medium effort task enhancement New feature or request impact: high Estimated high impact task

Comments

@yarikoptic
Copy link
Contributor

original proposal as a PR in BIDS: bids-standard/bids-specification#858 seems to be getting approved and if merged soon will enter a release after v1.6.0-203-g811e17e2 ,

@sappelhoff
Copy link
Member

bids-standard/bids-specification#858 has been merged, so the corresponding check in the validator could/should be worked on. Suggestions on where to start would be very welcome @rwblair :-)

@effigies
Copy link
Contributor

I think we need to think through the logic of the check before we worry about where to implement.

Here's a proposal that only cares about collisions within directories:

def check_dir(dirname):
    files = listdir(dirname)
    if len(set(fname.lower() for fname in files)) != len(files):
        # report collision
    for fname in files:
        if isdir(fname):
            check_dir(fname)

It should catch colliding subjects, but not something like sub-s01/anat/sub-S01_..., and it won't care about if one subject uses mixed cases for its acquisition parameters and another uses lowercase. Do we care about these? They might be caught by other checks, in any case.

Here's another edge case: sub-01/anat/sub-01_acq-HiRes_T1w.nii.gz and sub-01/func/sub-01_task-rest_acq-hires_bold.nii.gz. I think this this should not be flagged, as it's not a collision, but if we implemented something like a global collision test for each entity, it would be. And others might think it should be.

@yarikoptic
Copy link
Contributor Author

Since the issue is relatively rare, I think it might suffice to just operate on a full list of files/directories in the dataset

def check_paths(paths):
    lowered = defaultdict(list)
    for path in paths:
        path = path.rstrip('/')  # just to harnomize among possible directories with/without trailing /?
        lowered[path.lower()].append(path)
    for lower, origs in lowered.items():
        if len(origs) > 1:
            # report/return collision for `origs`  -> `lower`

note: paths better not to be "dereferenced" (so git-annex'ed symlinks remain symlinks)

@rwblair
Copy link
Member

rwblair commented Sep 28, 2021

Regardless of if the validator is run via browser or node we create a dictionary with all the files as keys in the fileArrayToObject function in utils/files/readDir.js. Filenames could be normalized here and we could check for collisions, but we currently do not create/return issues at this stage of validation.

There is precedent for some validation happening in the ./utils/files/ directory with /utils/files/validateMisc.js. So one option would be adding a new file to do this validation into ./utils/file/ and then calling the new validator function in the ./validators/bids/fullTest.js similar to how validateMisc does.

@effigies
Copy link
Contributor

Would either of these checks catch @mattcieslak's motivating use case?

@nellh
Copy link
Member

nellh commented Sep 28, 2021

Regardless of if the validator is run via browser or node we create a dictionary with all the files as keys in the fileArrayToObject function in utils/files/readDir.js. Filenames could be normalized here and we could check for collisions, but we currently do not create/return issues at this stage of validation.

There is precedent for some validation happening in the ./utils/files/ directory with /utils/files/validateMisc.js. So one option would be adding a new file to do this validation into ./utils/file/ and then calling the new validator function in the ./validators/bids/fullTest.js similar to how validateMisc does.

This check should consider ignored files, so it will need to be near the start of fullTest before they are removed.

@mattcieslak
Copy link

In our case there were two subjects that had the same ID but with different capitalization (sub-ndar, sub-NDAR) in the root bids directory. All the files within their directories matched the case of their subject directory name

@yarikoptic
Copy link
Contributor Author

This check should consider ignored files, so it will need to be near the start of fullTest before they are removed.

"not sure" it should anyhow consult ignored files -- the problem from the collisions happens at the filesystem level, so even if e.g. sub-NDAR is ignored while sub-ndar not -- collision while downloading/datalad clone etc that dataset would result in trouble.

@effigies effigies transferred this issue from bids-standard/legacy-validator Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: medium Estimated medium effort task enhancement New feature or request impact: high Estimated high impact task
Projects
None yet
Development

No branches or pull requests

6 participants