Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make transforms optional #4190

Merged
merged 12 commits into from
Apr 29, 2022
Merged

Make transforms optional #4190

merged 12 commits into from
Apr 29, 2022

Conversation

bhashemian
Copy link
Member

Description

This PR makes the transform argument in the existing datasets optional (CacheDataset, SmartDataset, etc.). I believe the user should be able to use these datasets without providing a transform.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).

@bhashemian bhashemian requested review from Nic-Ma and wyli April 27, 2022 19:50
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Apr 28, 2022

Hi @drbeh ,

Thanks for raising this discussion.
I think we didn't default transform to None in the cache datasets because if you don't have transform, then no need to cache anymore, so no need to use these datasets.
What do you think?
CC @wyli @ericspod @rijobro for more discussion if necessary.

Thanks.

@bhashemian
Copy link
Member Author

bhashemian commented Apr 28, 2022

Hi @drbeh ,

Thanks for raising this discussion. I think we didn't default transform to None in the cache datasets because if you don't have transform, then no need to cache anymore, so no need to use these datasets. What do you think? CC @wyli @ericspod @rijobro for more discussion if necessary.

Thanks.

Hi @Nic-Ma, thanks for your reply. I see your point but what if the user want to cache the images (without any transform) to save the IO time?

In pathology that loading patched are very costly, one may want to use smart cache dataset even without any transform.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Apr 28, 2022

Hi @drbeh ,

Thanks for your explanation.
If you want to cache the loaded images, usually at least you need a LoadImaged transform instead of None transform.
Could you please help provide some example or draft code to help us understand your use case?

Thanks in advance.

@bhashemian
Copy link
Member Author

@Nic-Ma, PatchWSIDataset, for instance, does not require any transform and extract and load patches from whole slide images.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Apr 28, 2022

@Nic-Ma, PatchWSIDataset, for instance, does not require any transform and extract and load patches from whole slide images.

Hi @drbeh ,

I think we already set transform=None as default for PatchWSIDataset:
https://github.com/Project-MONAI/MONAI/blob/dev/monai/data/wsi_datasets.py#L58
And it's the same as its base class:
https://github.com/Project-MONAI/MONAI/blob/dev/monai/data/dataset.py#L65
I mean we usually have transforms for cache datasets, not all kinds of datasets, or maybe I misunderstood your use case?

Thanks.

@bhashemian
Copy link
Member Author

@Nic-Ma, PatchWSIDataset, for instance, does not require any transform and extract and load patches from whole slide images.

Hi @drbeh ,

I think we already set transform=None as default for PatchWSIDataset: https://github.com/Project-MONAI/MONAI/blob/dev/monai/data/wsi_datasets.py#L58 And it's the same as its base class: https://github.com/Project-MONAI/MONAI/blob/dev/monai/data/dataset.py#L65 I mean we usually have transforms for cache datasets, not all kinds of datasets, or maybe I misunderstood your use case?

Thanks.

I'm trying to give you an idea where it is possible to need caching data without having a transform. Maybe it's easier to discuss it online.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Apr 28, 2022

After some discussion, here are my final comments:

  1. We can add transforms=None to CacheDataset and its subclasses, for the use case to cache data of datasets chain:
data = WSIPatchDataset(..., transform=None)
cachedata = SmartCacheDataset(data=data, transform=None, replace_rate=0.2)
  1. But no need to add transform=None to PersistentDataset, because PersistentDataset must start from file path and compute hash value with it.
  2. Please add more unit tests to cover the transforms=None in these datasets.
  3. And please also add the missing unit test case you said in: Disable pylint error and fix CI tests of new tifffile #4162 (comment)

@wyli @ericspod Do you have any other opinions?

Thanks in advance.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Apr 28, 2022

Hi @drbeh ,

And I think maybe it's better to add None support for the transform arg but doesn't set the default value to None, because for most cases we expect transforms, refer to:
https://github.com/Project-MONAI/MONAI/blob/dev/monai/data/dataset.py#L204

What do you think?

Thanks.

@bhashemian
Copy link
Member Author

We can do that but it's the same for Dataset where we set default value to None:

def __init__(self, data: Sequence, transform: Optional[Callable] = None) -> None:

Isn't better to have the same thing here?

@bhashemian
Copy link
Member Author

@wyli @Nic-Ma any other comments here?

Copy link
Contributor

@wyli wyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks good to me. I'll merge this once this is fixed #4200

@wyli
Copy link
Contributor

wyli commented Apr 29, 2022

/build

@wyli wyli enabled auto-merge (squash) April 29, 2022 19:15
@wyli wyli merged commit 1331432 into Project-MONAI:dev Apr 29, 2022
Can-Zhao pushed a commit to Can-Zhao/MONAI that referenced this pull request May 10, 2022
* Make all transforms optional

Signed-off-by: Behrooz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants