How to search through NWB metadata (for the IBL dataset) #100
Replies: 7 comments 2 replies
-
@GaelleChapuis These types of detailed metadata-based asset-level searches can be done using the Python API. Some of this metadata is already extracted into asset-level metadata, and others will require opening the NWB file. See here for existing examples. I'll see if I can put together scripts the do the types of queries you are asking for. |
Beta Was this translation helpful? Give feedback.
-
In many cases you'll be able to query the asset-level metadata in the DANDI API, (i.e. the result of this line below: from dandi.dandiapi import DandiAPIClient
from tqdm.notebook import tqdm
from pynwb import NWBHDF5IO
import h5py
import fsspec
fs = fsspec.filesystem("http")
def parse_metadata(s3_url):
"""Function to open nwb file and parse the desired metadata"""
with fs.open(s3_url, "rb") as f:
with h5py.File(f, "r") as file:
with NWBHDF5IO(file=file, mode="r", load_namespaces=True) as io:
nwbfile = io.read()
return dict(
path=metadata["path"],
institution=nwbfile.institution,
lab=nwbfile.lab,
left_video_path=nwbfile.acquisition['OriginalVideoLeftCamera'].external_file[0],
related_publications=nwbfile.related_publications,
)
# iterate over all assets. If it is an NWB file, run `parse_metadata` and accumulate results
client = DandiAPIClient()
dandiset = client.get_dandiset("000409")
assets = list(dandiset.get_assets())
results = []
for asset in tqdm(assets[:20]):
metadata = asset.get_raw_metadata()
if metadata['encodingFormat'] == 'application/x-nwb':
s3_url = metadata["contentUrl"][1]
results.append(parse_metadata(s3_url))
results I've added the |
Beta Was this translation helpful? Give feedback.
-
@GaelleChapuis Thanks for getting in touch! As you look over the NWB file versions of the IBL data, let me know of any other little details like this that I can look into including/modifying/otherwise fixing
Note that the 'repeated site' experiment has not been specifically converted yet - the previous conversion was focused specifically on the brain wide map, though as memory serves there were a fair number of sessions that overlapped the two Though also, as I recall from my notes on the matter, there may be some additional variability in the trial structures for the sessions outside of the BWM which I'd need to dig deeper into in order to resolve the mapping properly Anyway, the field I believe you're asking about is the 'trajectory_estimate' property of a particular probe ID ( I did not include that information in the first round of NWB mapping since all electrodes/sorted units have precise CCF (and the other two atlases as well) coordinates, which seemed to be more informative overall But if you're saying this is intended to be used as a summary value, search field, or other high level filter of session metadata, then I'd be happy to take a look at including it when I do a reconversion to include the passive data (which now appears to have been released, is that correct?)
I will note that an eventual goal to help manage this navigation specific to IBL, though not yet achieved, was to have separate DANDI sets corresponding to each major segment of the data release (behavior - brain wide map - repeated side - spike sorting benchmark) The plan was also to link these via DANDI's 'associated projects' metadata feature |
Beta Was this translation helpful? Give feedback.
-
Hello, thank you both for your answers, and sorry for the late reply - I was away. To get a feel for the user experience, I installed the DANDI API on my machine, and run the lines of code you provided above @bendichter -- putting a Here is what is returned from the metadata: ![]() How possible is it to add fields that we know external users will be looking for? To answer to Cody: |
Beta Was this translation helpful? Give feedback.
-
Follow up question: I cannot seem to find the equivalent of the Thank you for your help ! |
Beta Was this translation helpful? Give feedback.
-
Yes, that would be helpful |
Beta Was this translation helpful? Give feedback.
-
Hello, What is important is to distinguish the search case between "session" and "insertion". I hope this makes sense, let me know if not. Here are useful queries to begin with, each time the ONE documentation is linked below : Search sessions with a given:
Search insertions with a given:
|
Beta Was this translation helpful? Give feedback.
-
Hello,
When working with the IBL dataset, there are a couple queries we routinely do via ONE (using typically
one.alyx.rest
) to search for datasets according to their metadata, and I was wondering how to do the same using NWB. Our archive number is: https://dandiarchive.org/dandiset/000409For example:
If I wanted to get all the sessions acquired in a given lab, how can I do it (and I mean that without looking at the folder naming)?
If I wanted to know all the insertions acquired at X planned coordinates (e.g. what we call the “repeated site”), how can I get the session folder + know which ephys data to use within it?
How can I get sessions associated with a particular publication tag (there are multiple ones already, associated with different papers)?
Thank you for your help !
Beta Was this translation helpful? Give feedback.
All reactions