-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature proposal] Allow processing multiple images with transforms.Compose #1169
Comments
Image segmentation is a fairly common application nowadays. For now I'm using this https://github.com/Jonas1312/pytorch-segmentation-dataset but it would be nice if torchvision could support data/mask augmentation natively. |
Hi @Noiredd Thanks for the proposal, the PR in #1315 and the discussion, and sorry for not getting back to you before. I believe the problem mentioned in #611 (comment) is still valid in here.
Another example: we add a I think this is a fundamental problem with extending the This is the reason why we have been advocating for using the functional interface (e.g., in #610). In the same way that PyTorch doesn't support What I do think we should be doing instead is to more broadly use the functional interface, and use Thoughts? |
Just as you were writing that comment, I was working on a solution to this very issue :) My idea: there are basically three cases with augmentations for segmentation:
The new version of
If an image-only transform (case 2) is detected, in the label-specific pipeline it is replaced with a new Please see the newest commits in #1315, let me know what you think of the approach. I do realize this adds a little overhead to creating new augmentations (as one has to remember to register them to |
@Noiredd I've replied in the PR now, let's maybe keep the discussion there now? #1315 (review) |
Proposal
Following discussions started in #9, #230, #240, and most recently in #610, I would like to propose the following change to
transforms.Compose
(and the transform classes) that would allow easy processing of multiple images with the same parameters using existing infrastructure. I think this would be very useful for segmentation tasks, where both the image and label need to be transformed exactly the same way.Details
Currently the problem is that each transform, when called, implicitly generates randomized parameters (if it is a random transform) right before computing the transformation. In my opinion, it doesn't have to be so - parameter generation (
get_params
) is already separated from the actual image operation (which relies on the functional backend). My idea comes in two parts: first, completely decouple parameter generation from transformation; then allowCompose
to generate parameters once and apply transformations multiple times.Step 1, on the example of
RandomResizedCrop
:generate_params
method, to access the existingget_params
but without the need to pass specific arguments. This function would look exactly the same for every transform that needs any random parameters. Passing specific arguments toget_params
will be implementation-dependent.__call__
to optionally accept a tuple of pre-generated params:Step 2 is enabling this functionality in
Compose
by changing__call__
to accept iterables. Alternatively, we could subclass it entirely, which I will do in this example:Subclassing offers some advantages, for example interpolation methods could be bound to iterable indices at
__init__
, so we could interpolate the first item bilinearly, and the second with nearest-neighbour (ideal for segmentation).Alternative approach
Instead of doing try/except in the
Compose
subclass, all transforms could be changed to inherit from a newBaseTransform
abstract class, which could definegenerate_params
as a trivial function returningNone
. Then we could just do:because static transforms like
Pad
would simply return None, while any random transforms would need to definegenerate_params
accordingly.Yes I do realize that this requires a slight refactoring of e.g.
RandomHorizontalFlip
Usage
The user could subclass
Dataset
to yield (image, label) tuples. This change would allow them to apply custom preprocessing/augmentation transformations separately, instead of hard-coding them in the dataset implementation using functional backend. It would look sort of like this:I think this would be a significantly more convenient way of doing this.
Let me know if you think this is worth pursuing. I will have some free time next week so if this would be useful and has a chance of being merged, I'd happily implement it myself. If you see any potential pitfalls or backwards-compatibility ruining caveats - please tell me as well.
Addenda
Later I have found PR #611, but it seems to have been abandoned by now, having encountered some issues that I think my plan of attack can overcome.
Some deal of the problems with this stem from the fact that
get_params
, since their introduction in #311, do not share an interface between classes. Instead, getting params for each transform is a completely different call. This feels anti-OOP and counter-intuitive to me; are there any reasons why this has been made this way? @alykhantejani?cc @vfdev-5
The text was updated successfully, but these errors were encountered: