pynapple-org · gviejo · May 28, 2024 · May 21, 2024 · May 21, 2024 · May 22, 2024
@@ -47,18 +47,11 @@ new_intervalset = intervalset[0]
 
 See the [documentation](https://pynapple-org.github.io/pynapple/reference/core/interval_set/) for more details.
 
-
 ### pynapple >= 0.4
 
-Starting with 0.4, pynapple rely on the [numpy array container](https://numpy.org/doc/stable/user/basics.dispatch.html) approach instead of Pandas for the time series. Pynapple builtin functions will remain the same except for functions inherited from Pandas. Typically this line of code in `pynapple<=0.3.6` :
-```python
-meantsd = tsdframe.mean(1)
-```
-is now :
-```python
-meantsd = np.mean(tsdframe, 1)
-```
-in `pynapple>=0.4.0`. This allows for a better handling of returned objects.
+Starting with 0.4, pynapple rely on the [numpy array container](https://numpy.org/doc/stable/user/basics.dispatch.html) approach instead of Pandas for the time series. Pynapple builtin functions will remain the same except for functions inherited from Pandas. 
+
+This allows for a better handling of returned objects.
 
 Additionaly, it is now possible to define time series objects with more than 2 dimensions with `TsdTensor`. You can also look at this [notebook](https://pynapple-org.github.io/pynapple/generated/gallery/tutorial_pynapple_numpy/) for a demonstration of numpy compatibilities.
 

@@ -4,15 +4,15 @@ Credits
 Development Lead
 ----------------
 
--   Guillaume Viejo <[email protected]>
+-   Guillaume Viejo <[email protected]>
+
 
 Contributors
 ------------
 
+- 	Edoardo Balzani <[email protected]>
 -	Adrien Peyrache
 - 	Dan Levenstein
 -	Sofia Skromne Carrasco
-- 	Sara Mahallati
-- 	Gilberto Vite
 -	Davide Spalla
 - 	Luigi Petrucco
@@ -8,6 +8,12 @@ Around 2016-2017, Luke Sjulson started *TSToolbox2*, still in Matlab and which i
 In 2018, Francesco started neuroseries, a Python package built on Pandas. It was quickly adopted in Adrien's lab, especially by Guillaume Viejo, a postdoc in the lab. Gradually, the majority of the lab was using it and new functions were constantly added.
 In 2021, Guillaume and other trainees in Adrien's lab decided to fork from neuroseries and started *pynapple*. The core of pynapple is largely built upon neuroseries. Some of the original changes to TSToolbox made by Luke were included in this package, especially the *time_support* property of all ts/tsd objects.
 
+0.6.6 (Soon)
+------------------
+
+- Full lazy-loading for NWB file.
+- Parameter `load_array` for time series can prevent loading zarr array
+- Function to merge a list of `TsGroup`
 
 0.6.5 (2024-05-14)
 ------------------

@@ -0,0 +1,3 @@
+# API guide
+
+Guide to the `pynapple` API.
@@ -111,8 +111,14 @@
 tsgroup = nap.TsGroup(my_ts)
 
 print(tsgroup, "\n")
-print(tsgroup[0], "\n")  # dictionary like indexing returns directly the Ts object
-print(tsgroup[[0, 2]])  # list like indexing
+
+# %%
+# Dictionary like indexing returns directly the Ts object
+print(tsgroup[0], "\n")  
+
+# %%
+# List like indexing
+print(tsgroup[[0, 2]])  
 
 # %%
 # Operations such as restrict can thus be directly applied to the TsGroup as well as other operations.
@@ -126,21 +132,83 @@
 print(count)
 
 # %%
-# One advantage of grouping time series is that metainformation can be appended directly on an element-wise basis. In this case, we add labels to each Ts object when instantiating the group and after. We can then use this label to split the group. See the [TsGroup](https://peyrachelab.github.io/pynapple/core.ts_group/) documentation for a complete methodology for splitting TsGroup objects.
-
+# One advantage of grouping time series is that metainformation can be added directly on an element-wise basis. In this case, we add labels to each Ts object when instantiating the group and after. We can then use this label to split the group. See the [TsGroup](https://peyrachelab.github.io/pynapple/core.ts_group/) documentation for a complete methodology for splitting TsGroup objects.
+#
+# First we create a pandas Series for the label.
 
 label1 = pd.Series(index=list(my_ts.keys()), data=[0, 1, 0])
 
-tsgroup = nap.TsGroup(my_ts, time_units="s", label1=label1)
-tsgroup.set_info(label2=np.array(["a", "a", "b"]))
+print(label1)
 
-print(tsgroup, "\n")
+# %%
+# We can pass `label1` at the initialization step.
+
+tsgroup = nap.TsGroup(my_ts, time_units="s", my_label1=label1)
+
+print(tsgroup)
+
+# %%
+# Notice how the label has been added as one column when printing `tsgroup`.
+#
+# We can also add a label for each items in 2 different ways after initializing the `TsGroup` object.
+# First with `set_info` :
+tsgroup.set_info(my_label2=np.array(["a", "a", "b"])) 
+
+print(tsgroup)
+
+# %%
+# Notice that you can pass directly a numpy array as long as it is the same size as the `TsGroup`.
+# 
+# We can also add new metadata by passing it as an item of the dictionary with a string key.
+tsgroup["my_label3"] = np.random.randn(len(tsgroup))
+
+print(tsgroup)
+
+# %%
+# Metadata columns can be viewed as attributes of `TsGroup`.
+
+tsgroup.my_label1
+
+# %%
+# or with the `get_info` method.
 
-newtsgroup = tsgroup.getby_category("label1")
-print(newtsgroup[0], "\n")
-print(newtsgroup[1])
+tsgroup.get_info("my_label3")
 
 
+# %%
+# Finally you can use the metadata to slice through the `TsGroup` object.
+#
+# There are multiple methods for it. You can use the `TsGroup` getter functions :
+#
+#   - `getby_category(col_name)` : categorized the metadata column and return a `TsGroup` for each category.
+#
+#   - `getby_threshold(col_name, value)` : threshold the metadata column and return a single `TsGroup`.
+#
+#   - `getby_intervals(col_name, bins)` : digitize the metadata column and return a `TsGroup` for each bin.
+#
+# In this example we categorized `tsgroup` with `my_label2`.
+
+dict_of_tsgroup = tsgroup.getby_category("my_label2")
+
+print(dict_of_tsgroup["a"], "\n")
+print(dict_of_tsgroup["b"])
+
+# %%
+# Notice that `getby_threshold` return directly a TsGroup.
+
+tsgroup.getby_threshold("my_label1", 0.5)
+
+# %%
+# Similar operations can be performed using directly the attributes of `TsGroup`.
+# For example, the previous line is equivalent to :
+
+tsgroup[tsgroup.my_label1>0.5]
+
+# %%
+# You can also chain queries with attributes.
+
+tsgroup[(tsgroup.my_label1==0) & (tsgroup.my_label2=="a")]
+
 # %%
 # ***
 # Time support
@@ -156,7 +224,7 @@
 my_ts = {
     0: nap.Ts(
         t=np.sort(np.random.uniform(0, 100, 10)), time_units="s"
-    ),  # here a simple dictionnary
+    ),  # here a simple dictionary
     1: nap.Ts(t=np.sort(np.random.uniform(0, 100, 20)), time_units="s"),
     2: nap.Ts(t=np.sort(np.random.uniform(0, 100, 30)), time_units="s"),
 }
@@ -165,11 +233,15 @@
 
 tsgroup_with_time_support = nap.TsGroup(my_ts, time_support=time_support)
 
+# %%
 print(tsgroup, "\n")
 
+# %%
 print(tsgroup_with_time_support, "\n")
 
-print(tsgroup_with_time_support.time_support)  # acceding the time support
+# %%
+# acceding the time support is an important feature of pynapple
+print(tsgroup_with_time_support.time_support)  
 
 # %%
 # We can use value_from which as it indicates assign to every timestamps the closed value in time from another time series.

@@ -47,7 +47,7 @@
 # %%
 # Here it shows all the subjects (in this case only A2929), all the sessions and all of the derivatives folders. It shows as well all the NPZ files that contains a pynapple object and the NWB files.
 #
-# The object project behaves like a nested dictionnary. It is then easy to loop and navigate through a hierarchy of folders when doing analyses. In this case, we are gonna take only the session A2929-200711.
+# The object project behaves like a nested dictionary. It is then easy to loop and navigate through a hierarchy of folders when doing analyses. In this case, we are gonna take only the session A2929-200711.
 
 
 session = project["sub-A2929"]["A2929-200711"]

@@ -0,0 +1,176 @@
+# coding: utf-8
+"""
+# NWB & Lazy-loading
+
+Pynapple currently provides loaders for two data formats :
+
+ - `npz` with a special structure. You can check this [notebook](../tutorial_pynapple_io) for a descrition of the methods for saving/loading `npz` files.
+
+ - [NWB format](https://pynwb.readthedocs.io/en/stable/index.html#)
+
+This notebook focuses on the NWB format. Additionaly it demonstrates the capabilities of pynapple for lazy-loading different formats.
+
+
+The dataset in this example can be found [here](https://www.dropbox.com/s/pr1ze1nuiwk8kw9/MyProject.zip?dl=1).
+"""
+# %%
+#
+
+import numpy as np
+import pynapple as nap
+
+# %%
+# NWB
+# --------------
+# When loading a NWB file, pynapple will walk through it and test the compatibility of each data structure with a pynapple objects. If the data structure is incompatible, pynapple will ignore it. The class that deals with reading NWB file is [`nap.NWBFile`](../../../reference/io/interface_nwb/). You can pass the path to a NWB file or directly an opened NWB file. Alternatively you can use the function [`nap.load_file`](../../../reference/io/misc/#pynapple.io.misc.load_file).
+#
+# 
+# !!! note
+# 	Creating the NWB file is outside the scope of pynapple. The NWB file used here has already been created before.
+# 	Multiple tools exists to create NWB file automatically. You can check [neuroconv](https://neuroconv.readthedocs.io/en/main/), [NWBGuide](https://nwb-guide.readthedocs.io/en/latest/) or even [NWBmatic](https://github.com/pynapple-org/nwbmatic).
+
+
+data = nap.load_file("../../your/path/to/MyProject/sub-A2929/A2929-200711/pynapplenwb/A2929-200711.nwb")
+
+print(data)
+
+# %%
+# Pynapple will give you a table with all the entries of the NWB file that are compatible with a pynapple object.
+# When parsing the NWB file, nothing is loaded. The `NWBFile` keeps track of the position of the data whithin the NWB file with a key. You can see it with the attributes `key_to_id`.
+
+data.key_to_id
+
+
+# %%
+# Loading an entry will get pynapple to read the data.
+
+z = data['z']
+
+print(data['z'])
+
+# %%
+# Internally, the `NWBClass` has replaced the pointer to the data with the actual data.
+#
+# While it looks like pynapple has loaded the data, in fact it did not. By default, calling the NWB object will return an HDF5 dataset.
+
+print(type(z.values))
+
+# %%
+# Notice that the time array is always loaded.
+
+print(type(z.index.values))
+
+# %%
+# This is very useful in the case of large dataset that do not fit in memory. You can then get a chunk of the data that will actually be loaded.
+
+z_chunk = z.get(670, 680) # getting 10s of data.
+
+print(z_chunk)
+
+# %%
+# Data are now loaded.
+
+print(type(z_chunk.values))
+
+# %%
+# You can still apply any high level function of pynapple. For example here, we compute some tuning curves without preloading the dataset.
+
+tc = nap.compute_1d_tuning_curves(data['units'], data['y'], 10)
+
+print(tc)
+
+# %%
+#   !!! warning
+#       Carefulness should still apply when calling any pynapple function on a memory map. Pynapple does not implement any batching function internally. Calling a high level function of pynapple on a dataset that do not fit in memory will likely cause a memory error.
+
+# %%
+# To change this behavior, you can pass `lazy_loading=False` when instantiating the `NWBClass`.
+path = "../../your/path/to/MyProject/sub-A2929/A2929-200711/pynapplenwb/A2929-200711.nwb"
+data = nap.NWBFile(path, lazy_loading=False)
+
+z = data['z']
+
+print(type(z.d))
+
+
+# %%
+# Numpy memory map
+# ----------------
+#
+# In fact, pynapple can work with any type of memory map. Here we read a binary file with [`np.memmap`](https://numpy.org/doc/stable/reference/generated/numpy.memmap.html).
+
+eeg_path = "../../your/path/to/MyProject/sub-A2929/A2929-200711/A2929-200711.eeg"
+frequency = 1250 # Hz
+n_channels = 16
+f = open(eeg_path, 'rb') 
+startoffile = f.seek(0, 0)
+endoffile = f.seek(0, 2)
+f.close()
+bytes_size = 2
+n_samples = int((endoffile-startoffile)/n_channels/bytes_size)
+duration = n_samples/frequency
+interval = 1/frequency
+
+fp = np.memmap(eeg_path, np.int16, 'r', shape = (n_samples, n_channels))
+timestep = np.arange(0, n_samples)/frequency
+
+print(type(fp))
+
+# %%
+# Instantiating a pynapple `TsdFrame` will keep the data as a memory map.
+
+eeg = nap.TsdFrame(t=timestep, d=fp)
+
+print(eeg)
+
+# %%
+# We can check the type of `eeg.values`.
+
+print(type(eeg.values))
+
+
+# %%
+# Zarr
+# --------------
+#
+# It is also possible to use Higher level library like [zarr](https://zarr.readthedocs.io/en/stable/index.html) also not directly.
+
+import zarr
+data = zarr.zeros((10000, 5), chunks=(1000, 5), dtype='i4')
+timestep = np.arange(len(data))
+
+tsdframe = nap.TsdFrame(t=timestep, d=data)
+
+# %%
+# As the warning suggest, `data` is converted to numpy array.
+
+print(type(tsdframe.d))
+
+# %%
+# To maintain a zarr array, you can change the argument `load_array` to False.
+
+tsdframe = nap.TsdFrame(t=timestep, d=data, load_array=False)
+
+print(type(tsdframe.d))
+
+# %%
+# Within pynapple, numpy memory map are recognized as numpy array while zarr array are not.
+
+print(type(fp), "Is np.ndarray? ", isinstance(fp, np.ndarray))
+print(type(data), "Is np.ndarray? ", isinstance(data, np.ndarray))
+
+
+# %%
+# Similar to numpy memory map, you can use pynapple functions directly.
+
+ep = nap.IntervalSet(0, 10)
+tsdframe.restrict(ep)
+
+# %%
+group = nap.TsGroup({0:nap.Ts(t=[10, 20, 30])})
+
+sta = nap.compute_event_trigger_average(group, tsdframe, 1, (-2, 3))
+
+print(type(tsdframe.values))
+print("\n")
+print(sta)
@@ -67,15 +67,15 @@
 print(spikes)
 
 # %%
-# In this case, the TsGroup holds 15 neurons and it is possible to access, similar to a dictionnary, the spike times of a single neuron:
+# In this case, the TsGroup holds 15 neurons and it is possible to access, similar to a dictionary, the spike times of a single neuron:
 neuron_0 = spikes[0]
 print(neuron_0)
 
 # %%
 # `neuron_0` is a [Ts](https://pynapple-org.github.io/pynapple/core.time_series/#pynapple.core.time_series.Ts) object containing the times of the spikes.
 
 # %%
-# The other information about the session is contained in `nwb["epochs"]`. In this case, the start and end of the sleep and wake epochs. If the NWB time intervals contains tags of the epochs, pynapple will try to group them together and return a dictionnary of IntervalSet instead of IntervalSet.
+# The other information about the session is contained in `nwb["epochs"]`. In this case, the start and end of the sleep and wake epochs. If the NWB time intervals contains tags of the epochs, pynapple will try to group them together and return a dictionary of IntervalSet instead of IntervalSet.
 epochs = nwb["epochs"]
 print(epochs)