Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joint BIDS-NWB metadata extraction. #1183

Merged
merged 15 commits into from
Jan 30, 2023
Merged

Joint BIDS-NWB metadata extraction. #1183

merged 15 commits into from
Jan 30, 2023

Conversation

TheChymera
Copy link
Contributor

Closes: #1172

@TheChymera TheChymera added tests Add or improve existing tests BIDS NWB labels Jan 3, 2023
@TheChymera TheChymera changed the title Whitelisting new BIDS-NWB dataset Joint BIDS-NWB metadata extraction. Jan 3, 2023
@codecov
Copy link

codecov bot commented Jan 3, 2023

Codecov Report

Base: 89.08% // Head: 89.16% // Increases project coverage by +0.07% 🎉

Coverage data is based on head (4aca649) compared to base (d958555).
Patch coverage: 81.81% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1183      +/-   ##
==========================================
+ Coverage   89.08%   89.16%   +0.07%     
==========================================
  Files          76       76              
  Lines        9448     9469      +21     
==========================================
+ Hits         8417     8443      +26     
+ Misses       1031     1026       -5     
Flag Coverage Δ
unittests 89.16% <81.81%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandi/tests/fixtures.py 97.59% <40.00%> (-1.01%) ⬇️
dandi/metadata.py 87.55% <83.33%> (+0.08%) ⬆️
dandi/files/bids.py 97.47% <100.00%> (+2.52%) ⬆️
dandi/tests/test_metadata.py 100.00% <100.00%> (ø)
dandi/tests/test_files.py 100.00% <0.00%> (ø)
dandi/files/bases.py 78.72% <0.00%> (+1.41%) ⬆️
dandi/support/threaded_walk.py 94.82% <0.00%> (+1.72%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TheChymera
Copy link
Contributor Author

TheChymera commented Jan 4, 2023

@yarikoptic any ideas why this test requires docker? Does it need some upload function to generate the SampleDandiset? (I have docker, ofc, just very surprising that this sets it off or gets skipped if I suspend the service)

(dev) [deco]~/src/dandi-cli ❱ pytest -vvs dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.9, pytest-7.2.0, pluggy-1.0.0 -- /usr/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/chymera/src/dandi-cli, configfile: tox.ini
plugins: pkgcore-0.12.18, mock-3.10.0
collected 1 item

dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration Cloning into '/tmp/pytest-of-chymera/pytest-6/gitrepo0'...
remote: Enumerating objects: 2490, done.
remote: Counting objects: 100% (2490/2490), done.
remote: Compressing objects: 100% (1786/1786), done.
remote: Total 2490 (delta 541), reused 1899 (delta 364), pack-reused 0
Receiving objects: 100% (2490/2490), 371.26 KiB | 1.66 MiB/s, done.
Resolving deltas: 100% (541/541), done.
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 5 (delta 0), reused 3 (delta 0), pack-reused 0
Receiving objects: 100% (5/5), 7.57 KiB | 7.57 MiB/s, done.
remote: Enumerating objects: 142, done.
remote: Counting objects: 100% (142/142), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 142 (delta 38), reused 121 (delta 36), pack-reused 0
Receiving objects: 100% (142/142), 361.83 KiB | 2.62 MiB/s, done.
Resolving deltas: 100% (38/38), done.
errors pretty printing info
SKIPPED (docker engine not running)

=========================================================== slowest 10 durations ===========================================================
3.09s setup    dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
0.00s teardown dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
============================================================ 1 skipped in 3.18s ============================================================
(dev) [deco]~/src/dandi-cli/ ❱ ag bids_nwb_dandiset -B 2 -A 3 dandi/tests/test_metadata.py
52-
53-
54:def test_bids_nwb_metadata_integration(bids_nwb_dandiset: SampleDandiset) -> None:
55:    metadata = get_metadata(bids_nwb_dandiset)
56-    print(metadata)
57-
58-

@TheChymera
Copy link
Contributor Author

Hm, so hitting a digest issue again, perhaps this PR could cautiously be extended to cover #1178 as well, since it might require sorting that bit out...

@jwodder sorry to ping you again, but I might benefit from your insight on this. Did I make any obvious mistake in d99891f? I'm basically trying to generate a fake digest as per the code thus far used in

if use_fake_digest:
.

Alternatively, and in the context of the overall digest discussion, is there any way to just pull the plug on it in a more comprehensive manner? And if so would it be at all advisable?

@jwodder
Copy link
Member

jwodder commented Jan 6, 2023

@TheChymera Print out the value of metadata before this line and report back.

is there any way to just pull the plug on it in a more comprehensive manner? And if so would it be at all advisable?

I'm not sure what you mean. dandi-schema requires a digest in order to construct a BareAsset. If you want to change that, talk to Satra.

@TheChymera
Copy link
Contributor Author

TheChymera commented Jan 10, 2023

@jwodder

(dev) [deco]~/src/dandi-cli ❱ pytest -vvs dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.9, pytest-7.2.0, pluggy-1.0.0 -- /usr/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/chymera/src/dandi-cli, configfile: tox.ini
plugins: pkgcore-0.12.18, mock-3.10.0
collected 1 item

dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration Cloning into '/tmp/pytest-of-chymera/pytest-9/gitrepo0'...
remote: Enumerating objects: 2490, done.
remote: Counting objects: 100% (2490/2490), done.
remote: Compressing objects: 100% (1786/1786), done.
remote: Total 2490 (delta 541), reused 1898 (delta 364), pack-reused 0
Receiving objects: 100% (2490/2490), 371.24 KiB | 1.39 MiB/s, done.
Resolving deltas: 100% (541/541), done.
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 5 (delta 0), reused 3 (delta 0), pack-reused 0
Receiving objects: 100% (5/5), 7.57 KiB | 7.57 MiB/s, done.
remote: Enumerating objects: 142, done.
remote: Counting objects: 100% (142/142), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 142 (delta 38), reused 121 (delta 36), pack-reused 0
Receiving objects: 100% (142/142), 361.83 KiB | 2.68 MiB/s, done.
Resolving deltas: 100% (38/38), done.
None
AAAAAAAAAAAAAAAAAAAAAAAA
Digest(algorithm=<DigestType.dandi_etag: 'dandi:dandi-etag'>, value='00000000000000000000000000000000-1')
{'schemaKey': 'Asset', 'schemaVersion': '0.6.3', 'access': [{'schemaKey': 'AccessRequirements', 'status': 'dandi:OpenAccess'}], 'wasGeneratedBy': [{'schemaKey': 'Session', 'identifier': 'postimp', 'name': 'postimp'}, Activity(id='urn:uuid:955453a3-1239-4ef4-841b-71878266f56f', schemaKey='Activity', identifier=None, name='Metadata generation', description='Metadata generated by DANDI cli', startDate=None, endDate=None, wasAssociatedWith=[Software(id=None, schemaKey='Software', identifier='RRID:SCR_019009', name='DANDI Command Line Interface', version='0.48.0+10.gd99891f.dirty', url=HttpUrl('https://github.com/dandi/dandi-cli', ))], used=None)], 'wasAttributedTo': [{'schemaKey': 'Participant', 'identifier': '01'}], 'digest': {}, 'dateModified': datetime.datetime(2023, 1, 10, 11, 45, 18, 11343, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=68400), 'EST')), 'blobDateModified': datetime.datetime(2023, 1, 10, 11, 45, 16, 932084, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=68400), 'EST')), 'contentSize': 19664, 'encodingFormat': 'application/octet-stream', 'path': 'sub-01_ses-postimp_task-seizure_run-01_ieeg.nwb'}
FAILED

================================================================= FAILURES =================================================================
____________________________________________________ test_bids_nwb_metadata_integration ____________________________________________________
dandi/tests/test_metadata.py:66: in test_bids_nwb_metadata_integration
    metadata = get_metadata(file_path)
/usr/lib/python3.10/site-packages/fscacher/cache.py:152: in fingerprinter
    ret = fingerprinted(*args, **kwargs_)
/usr/lib/python3.10/site-packages/joblib/memory.py:594: in __call__
    return self._cached_call(args, kwargs)[0]
/usr/lib/python3.10/site-packages/joblib/memory.py:537: in _cached_call
    out, metadata = self.call(*args, **kwargs)
/usr/lib/python3.10/site-packages/joblib/memory.py:779: in call
    output = self.func(*args, **kwargs)
/usr/lib/python3.10/site-packages/fscacher/cache.py:98: in fingerprinted
    return f(path, *args, **kwargs)
dandi/metadata.py:103: in get_metadata
    path_metadata = df.get_metadata(digest=digest)
dandi/files/bids.py:230: in get_metadata
    bids_metadata = BIDSAsset.get_metadata(self)
dandi/files/bids.py:201: in get_metadata
    return BareAsset(**metadata)
pydantic/main.py:342: in pydantic.main.BaseModel.__init__
    ???
E   pydantic.error_wrappers.ValidationError: 1 validation error for BareAsset
E   digest
E     A non-zarr asset must have a dandi-etag. (type=value_error)
------------------------------------------------------------ Captured log call -------------------------------------------------------------
WARNING  bids-schema:validator.py:594 BIDSVersion `1.7.0` is less than the minimal working `schema`. Falling back to `schema`. To force the usage of earlier versions specify them explicitly when calling the validator.
=========================================================== slowest 10 durations ===========================================================
2.23s setup    dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
1.06s call     dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
0.00s teardown dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
========================================================= short test summary info ==========================================================
FAILED dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration - pydantic.error_wrappers.ValidationError: 1 validation error for BareAsset
============================================================ 1 failed in 3.55s =============================================================
(dev) [deco]~/src/dandi-cli ❱ git rev-parse HEAD
67a63a2e8d1333766c0ca7d589a1816531c5e5b1

It appears we do indeed have the etag Digest(algorithm=<DigestType.dandi_etag: 'dandi:dandi-etag'>, value='00000000000000000000000000000000-1') but for some reason it's not getting recognized 🤔

@jwodder
Copy link
Member

jwodder commented Jan 10, 2023

@TheChymera Pass --showlocals to pytest and report the results.

@TheChymera
Copy link
Contributor Author

@jwodder
Copy link
Member

jwodder commented Jan 11, 2023

@TheChymera This should fix it:

diff --git a/dandi/files/bids.py b/dandi/files/bids.py
index d2238c3..3c0d910 100644
--- a/dandi/files/bids.py
+++ b/dandi/files/bids.py
@@ -226,7 +226,7 @@ class NWBBIDSAsset(BIDSAsset, NWBAsset):
         digest: Optional[Digest] = None,
         ignore_errors: bool = True,
     ) -> BareAsset:
-        bids_metadata = BIDSAsset.get_metadata(self)
+        bids_metadata = BIDSAsset.get_metadata(self, digest, ignore_errors)
         nwb_metadata = NWBAsset.get_metadata(self, digest, ignore_errors)
         return BareAsset(
             **{**bids_metadata.dict(), **nwb_metadata.dict(exclude_none=True)}

@TheChymera
Copy link
Contributor Author

Although this doesn't really touch the logic of BIDS validation, apparently this PR introduces 2 new test failures in existing tests... Tried to debug this yesterday (have a commit littered with print calls, but it really didn't help me narrow it down).

Checking the logs, I get

2023-01-17T07:41:56-0500 [DEBUG ] dandi 4324:140070214657856 Problem obtaining metadata for asl003/sub-Sub1/anat/sub-Sub1_T1w.nii.gz: Unable to get metadata from non-BIDS, non-NWB asset.

And I have absolutely no clue why... I'll continue looking, but if you have any ideas @yarikoptic @jwodder let me know.

@TheChymera
Copy link
Contributor Author

I think I figured it out.

@TheChymera TheChymera marked this pull request as ready for review January 17, 2023 16:10
@TheChymera TheChymera requested a review from yarikoptic January 17, 2023 16:10
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some suggestion which I am yet to pursue "deeper" as well since it seems there is some unclear stack of semantics in what extracts what metadata

dandi/metadata.py Show resolved Hide resolved
dandi/metadata.py Show resolved Hide resolved
dandi/metadata.py Outdated Show resolved Hide resolved
@TheChymera
Copy link
Contributor Author

@jwodder could we merge?

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just left 1 suggestion to adopt to avoid obscure mix of / and \ in paths on windows. The rest is ok, let's proceed after suggestion is adopted

@yarikoptic yarikoptic merged commit 94c862f into master Jan 30, 2023
@yarikoptic yarikoptic deleted the bidsnwb branch January 30, 2023 21:05
@github-actions
Copy link

🚀 PR was released in 0.49.0 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BIDS NWB released tests Add or improve existing tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate .nwb files with our nwb validation routines even within BIDS dandisets
3 participants