Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added lazy loading option for nwb loading function #313

Merged
merged 18 commits into from
Jul 18, 2024

Conversation

vigji
Copy link
Collaborator

@vigji vigji commented Jul 15, 2024

I was missing the option of changing the lazy loading behavior of NWB files when going through the function. This PR is a very minimal parameter piping from the function to the underlying NWBFile object instantiation.

More general question about this function, for which I lack the broader context of where the package is going: it sounds funny to have the same function for such different constructs: on the one side, nwb files which are equivalent to whole collections of Tsds/TsdFrames/TsdTensors, on the other, npc files which from what I understand contain individually one of those items and as a consequence lacks a lazy_loading behavior (even though that could be implemented by returning the object and not the object.load() output; but this would either change the current default for load_file() on the NWB, by default true, and for the npz file, by default false).

Let me know what do you think about this!

@vigji vigji requested a review from gviejo as a code owner July 15, 2024 06:47
@gviejo
Copy link
Contributor

gviejo commented Jul 15, 2024

Reading your questions makes me thinks there should be a nap.load_nwb_file and nap.load_npz_file. That would make things cleaner.
Would you be willing to write those functions in the PR and mark nap.load_file as deprecated?

@gviejo
Copy link
Contributor

gviejo commented Jul 15, 2024

Alternatively, the lazy_loading argument could be propagated to np.load throughs the mmap_mode argument. It supports memory map if I understand correctly the documentation.

@BalzaniEdoardo
Copy link
Collaborator

I would vote for the second option ("mmap_mode") so that people do not have to remember two different commands for loading. I think it looks cleaner.

@vigji
Copy link
Collaborator Author

vigji commented Jul 15, 2024

@BalzaniEdoardo I do not know your current preferred usage of the package (for nwbs or folders of npzs) but it is a bit strange to keep the same function to load such different things (either whole datasets or parts of them).

I don't mind particularly, happy to evolve this PR in either direction, but from a design POV what @gviejo proposed (split and deprecate) feels definitively cleaner

@gviejo
Copy link
Contributor

gviejo commented Jul 15, 2024

I think best decision here is to give us minimal work. I would choose to keep a single nap.load_file. Numpy np.load_file does the same by taking either .npz or .npy.

@vigji
Copy link
Collaborator Author

vigji commented Jul 15, 2024

Ok! what about default of lazy mode then? True would change current behavior for .npz, False would change behavior for .nwb.

@gviejo
Copy link
Contributor

gviejo commented Jul 16, 2024

Ok so default is None and you go by default for lazy loading for NWB and not for NPZ. Changing this to True or False forces the behavior.

@vigji
Copy link
Collaborator Author

vigji commented Jul 16, 2024

Ok!
Implemented required behavior in load_file, but also added a refactoring proposal for NPZFile
Here is a draft, if you can have a look. The idea is to have a lazy loader object for the npz file, which in principle could have been a drop in replacement for the old class. However, I also noticed that the syntax for both the determination and in particular the instantiation was quite convoluted.

Everything can be very streamlined if we move the constructors from the file to be class methods of each relative object. I think this makes lot of sense as it binds more tightly together the save and the loading. Does this looks reasonable?

Copy link
Collaborator

@BalzaniEdoardo BalzaniEdoardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, but I think having a class method for each object is too much. I think we should have a single method for all the time-series, implemented in Base.

I added the code in the comments but it should look something like this

    @classmethod
    def _from_npz_reader(cls, file)
        kwargs = {key: val for key, val in file.items() if key not in ["start", "end", "type"]}
        iset = nap.IntervalSet(start=file["start"], end=file["end"])
        return cls(time_support=iset, **kwargs)

Copy link
Collaborator

@BalzaniEdoardo BalzaniEdoardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add tests to the npz loader to check that npz could be memmapped for real. my worry is that the memmap parameter of load will be ignored for npz, but this needs to be tested.

Independently, I still think the classmethod for loading the time series is quite elegant, and can simplify the io.interface_npz.py.

All the interface has to do is figure out the type, and call the class method.

@vigji
Copy link
Collaborator Author

vigji commented Jul 17, 2024

This should be done! Alas, no lazy mode for npz files. I really could not figure that out.

The interface looks cleaner now, so, it was not for nothing :)

A note: I am not sure to which degree there should be support for npz files without a type specification. If this is there for legacy files only, I would raise a warning and deprecate in the long term.

@vigji vigji requested a review from BalzaniEdoardo July 17, 2024 09:03
@vigji
Copy link
Collaborator Author

vigji commented Jul 17, 2024

Ok, everything should be good now!

Let me know about that point on the deprecation, if you want I can add the warning in this PR. Otherwise should be ready for merging

@gviejo
Copy link
Contributor

gviejo commented Jul 18, 2024

Thanks Luigi

@gviejo gviejo merged commit ef9dc1a into pynapple-org:dev Jul 18, 2024
11 checks passed
@BalzaniEdoardo
Copy link
Collaborator

BalzaniEdoardo commented Jul 18, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants