MultiCompose and SegmentationCompose [proof-of-concept] #1315

Noiredd · 2019-09-09T10:47:28Z

Follow up to #1169 (see that and comments for details).

I added MultiCompose and SegmentationCompose classes to allow processing multiple images with the same random transformations. SegmentationCompose is slightly more intelligent - it knows which transforms to execute on the label and which would destroy it (e.g. ColorJitter).

Most of the transforms have already been adjusted in accordance with step 1 described in #1169. Only RandomTransforms need some attention (might be tricky to rework).

I tested the functionality using the following:

import torch
import torchvision as tv

def seg_loader(paths):
    img_path, lab_path = paths
    with open(img_path, 'rb') as f:
        img = Image.open(f).convert('RGB')
    with open(lab_path, 'rb') as f:
        lab = Image.open(f).convert('RGB')
    return img, lab

class SegmentationFolder(tv.datasets.VisionDataset):
    def __init__(self, root, transforms=None, loader=seg_loader):
        super(SegmentationFolder, self).__init__(root, transforms=transforms)
        self.root_dir = root
        self.loader = loader
        self.images = [
            'src/images/12234.png',
            'src/images/20703.png',
            'src/images/33303.png',
            'src/images/40134.png',
            'src/images/57535.png',
        ]
        self.labels = [
            'src/labels/12234.png',
            'src/labels/20703.png',
            'src/labels/33303.png',
            'src/labels/40134.png',
            'src/labels/57535.png',
        ]

    def __getitem__(self, index):
        paths = (self.images[index], self.labels[index])
        return self.transforms(self.loader(paths))

    def __len__(self):
        return len(self.labels)

tfm = tv.transforms.SegmentationCompose([
    tv.transforms.RandomCrop(400),
    tv.transforms.RandomHorizontalFlip(),
    tv.transforms.ColorJitter(brightness=3),
    tv.transforms.RandomPerspective(),
    tv.transforms.ToTensor(),
    tv.transforms.Normalize((0, 0, 0), (1,1,1)),
])
src = SegmentationFolder(root='', transforms=tfm)
ldr = torch.utils.data.DataLoader(src, batch_size=1)
for b in ldr:
    print(b[0].shape, b[1].shape, b[1].numpy().max())

* added MultiCompose and SegmentationCompose for processing multiple images with the same random transformations * adjusted some of the transforms to allow this for example, do: tf = tv.transforms.SegmentationCompose([ tv.transforms.CenterCrop(400), tv.transforms.RandomHorizontalFlip(), ]) * TODO for full readiness: modify the following transforms: RandomTransforms, RandomCrop, RandomPerspective, RandomResizedCrop, ColorJitter, RandomAffine, RandomGrayscale, RandomErasing

also change its interface - some transforms need to know image shape

fmassa · 2019-09-10T09:52:02Z

Hi @Noiredd , thanks for the PR!

I put some of my thoughts in #1169 , let me know what you think

also fix param passing bug in both Multi and Segmentation versions

codecov-io · 2019-09-10T10:03:46Z

Codecov Report

Merging #1315 into master will decrease coverage by 0.57%.
The diff coverage is 40.86%.

@@            Coverage Diff             @@
##           master    #1315      +/-   ##
==========================================
- Coverage   65.81%   65.24%   -0.58%     
==========================================
  Files          75       75              
  Lines        5821     5918      +97     
  Branches      886      906      +20     
==========================================
+ Hits         3831     3861      +30     
- Misses       1725     1789      +64     
- Partials      265      268       +3

Impacted Files	Coverage Δ
torchvision/transforms/transforms.py	`72.8% <40.86%> (-8.15%)`	⬇️
torchvision/ops/roi_pool.py	`70.21% <0%> (ø)`	⬆️
torchvision/datasets/mnist.py	`51.7% <0%> (ø)`	⬆️
torchvision/ops/roi_align.py	`68% <0%> (ø)`	⬆️
torchvision/ops/boxes.py	`94.73% <0%> (+0.14%)`	⬆️
torchvision/utils.py	`66.03% <0%> (+0.65%)`	⬆️
torchvision/extension.py	`40.9% <0%> (+2.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4d5003...992f4e5. Read the comment docs.

fmassa

Thanks for the PR!

The new implementation handles some more cases, but I don't think it is general enough. We will end-up making some assumptions which will not hold true for some use-cases, and we will either:

increase the complexity of the transforms to support that extra use-case
decide not to support that use-case

The main problem is that I'm not sure we can have a general implementation which is still simple to use and understand for the users.
I've spent quite some time in the past looking into it, and I was not happy with any of the approaches I saw (because of the aforementioned issues).

My current feeling is that the transforms (for anything more complicated than single images) are application-specific (and not even domain-specific like segmentation), and thus should be handled by the application. An example can be found in the references folder for each application (which can be trivially done by the user thanks to the functional interface).

Thoughts?

fmassa · 2019-09-10T12:46:07Z

torchvision/transforms/transforms.py

+    Assumes that the inputs are tuples of:
+        (image, label)
+    or even
+        (image, image, ..., image, label)


While this is the most common case, it does not handle other (segmentation) use-cases that can appear, where we do not have a straight concept of image and label.
Imagine for example that we are predicting a depth map and a segmentation map from a pair of two images.
In this case, we would have two images and two labels, which is currently not supported by this approach.

Noiredd · 2019-09-10T13:52:10Z

My goal is not to build the most general data augmentation solution possible, because this has already been accomplished: it's functional.py. For anyone with some unusual problem this is the way to do that, I agree with you that PyTorch shouldn't bloat with application-specific solutions.

But I don't think that the simple image-to-classes segmentation is an unusual problem. In my personal experience it pops up everywhere, most of the time in this most simple formulation. Right now I keep writing (copy-pasting) the same code over and over to handle the exact same hassle every time (in ensuring random parameters are the same between images and labels) and avoid the exact same pitfalls (transforms that should not be directly applied to labels), and it happens very often (and not just to me). It is my honest opinion that segmentation deserves a built-in, dedicated solution instead of having the users reinvent the same wheel.

This PR doesn't offer anything more general because that's exactly how I envisioned it: solving one highly specific problem. However, it does make life considerably easier even for those with more unusual tasks. Introducing uniform interface for parameter generation (generate_params) means that anyone can write their own Compose now, using MultiCompose or SegmentationCompose as a reference for their particular case.

SebastienEske · 2019-09-18T08:22:06Z

Hello there, I made this pull request which I believe could be simple and generic enough.

SebastienEske · 2019-09-18T09:37:55Z

Regarding the issue with synchronizing random transforms across several uses, could we maybe add an optional seed parameter to each random transform? If they use the same seed for their random number generator, that could work, couldn't it?

fmassa · 2019-09-19T17:48:21Z

@SebastienEske

Regarding the issue with synchronizing random transforms across several uses, could we maybe add an optional seed parameter to each random transform? If they use the same seed for their random number generator, that could work, couldn't it?

This is a solution similar to what was proposed in #9 (comment). The discussion in there is very interesting, and there are a few good alternatives. We are discussing about potential ways of improving the transforms and making it more generic

cc @cpuhrsch @zhangguanheng66 @vincentqb for thoughts

Noiredd · 2019-09-20T08:43:50Z

I agree with @colesbury (#9 (comment)), sharing the RNG seed is a hack. But let's look at the root of the issue this approach is trying to solve: we have multiple independent* transforms but we need them to share something. Why not just give that something a meaningful identity? For example as a concept of transformation parameters, which is IMHO the most naturally occurring idea. Your idea from #9 (comment) is kind of that, except I can't see why would we need 2 classes for each transform. The way it's implemented right now, where each class represents a transformation and its parameter generation, is way more convenient - except for the get_params method having been made static.
What benefit does it offer - that I must confess eludes me. In my view, the only two things that are currently standing in the way of applying transforms over multiple images are: (1) that method, and (2) ability to actually use the generated parameters. Had get_params had a common interface (like generate_params in my proposal), and had __call__ accepted those params as input, anyone could write their own custom Compose in a breeze. Sorry if I'm being overly critical of the current system but I can't help but feel like bending over backwards having to do this:

affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
params = affine_augment.get_params(
    degrees=affine_augment.degrees,
    translate=affine_augment.translate,
    scale_ranges=affine_augment.scale,
    shears=affine_augment.shear,
    img_size=image.size
) # why static? had it been an object method I wouldn't have to retrieve all those attributes just to feed them back to it o_O
image = F.affine(image, *params)
label = F.affine(label, *params)

instead of just this:

affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
params = affine_augment.get_params(image) # the object has everything it needs!
image = affine_augment(image, params)
label = affine_augment(label, params)

There are probably other, maybe better ways to accomplish goals that SegmentationCompose and SegmentationDataset attempt. But no matter what those are, I'm convinced that changes to transforms are necessary anyway. It's either that or defaulting to functional interface, which I will personally always advocate against (in this context) - there's too much valuable infrastructure in get_params already.

* - Transforms have to be independent IMHO: if we rotate an image, we might want to interpolate it; but for its segmentation mask we cannot do this; and for object bounding boxes we need a completely different function. Etc.

SebastienEske · 2019-09-20T09:42:08Z

Hello, I would really think that if each transform had its own RNG object, we could provide it with a seed and that would solve the problem. The comment in #9 use a hack in the training loop. What I am suggesting is more along the lines of

class myrandomtransform():
    def __init__(self, seed=None):
        self.rng = CMWC() #or any other RNG object
        if seed!=None:
            self.rng.seed(seed)
    def run(self):
        return self.rng.random()
a = mytrans(2)
b = mytrans(2)
image = a.apply(image)
label = b.apply(label)

Here is a more complete code example
And this is a fully backward compatible change since we only add one optional parameter when creating the transform.

Noiredd · 2019-09-20T11:31:34Z

This does not convince me. The solution with parameters is much more solid and at the same time gives you more freedom. You have one object that you can call any number of times, whereas in the RNG approach one needs to construct as many objects as there are inputs. Worse, those objects are only implicitly connected - one has to make sure that each of them is called exactly the same number of times. And what if you wanted to do something less standard, like freeze transform parameters temporarily, or not use a transform sometimes (maybe your number of inputs is variable)? RNG solution fails miserably whenever you try to do anything that's not the standard use-case. This is exactly the level of fragility you don't want to have in a framework.

Explicit is better than implicit: direct control over the params is better than hoping that the implicit connection (via seed) will hold.

SebastienEske · 2019-09-20T14:25:21Z

And what if you wanted to do something less standard, like freeze transform parameters temporarily, or not use a transform sometimes (maybe your number of inputs is variable)?

Ok, I see your point. That seems to be more in the spirit of pytorch, yes.

One caveat is that there is still no way to synchronize two random transforms that are passed to a dataset. That was the point of the seed option which actually is very convenient to use when you only need the transforms for the dataset. Maybe we can implement both? With adequate user warning in the documentation? This already happens for CUDA and for distributed

Another way would be to have a master argument when creating a random transform that would indicate the transform object to reuse the state from the parent.

affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
affine_augment_sync = tv.transforms.RandomAffine(master=affine_augment)
image = affine_augment(image)
label= affine_augment_sync(label)

And inside the __call__ function we could have something like this:

def __apply__(input, params=None):
    if self.master is not None and params is None:
        params = master.get_params()
    elif params is None:
        params = self.update_params()
    return apply_params(input, params)

That way we have a strong synchronization. And you can still use get_params as intended with a slightly different call order if you want to have something really customized in the training loop:

affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
# update params and apply transform
image = affine_augment(image)
#get current params
params = affine_augment.get_params()
#do not update params, apply the provided ones instead
label= affine_augment_sync(label, params)

That would solve both use cases right?

Noiredd · 2019-10-04T15:08:27Z

It's been beautifully written in The Zen of Python and I think it applies here: "There should be one-- and preferably only one --obvious way to do it". Synchronising via RNG seed is a bad idea - everything that it offers can be easily done with my parameter-based proposal, and then some. Implementing both means that we now have one way to do it which is error-prone and generally broken, and the other, right one.

I answered you about the master-follower idea under #1406 (comment).

Noiredd added 2 commits September 9, 2019 12:28

rework most transforms by adding generate_params

a98fe4d

also change its interface - some transforms need to know image shape

fmassa mentioned this pull request Sep 10, 2019

[Feature proposal] Allow processing multiple images with transforms.Compose #1169

Open

fmassa added module: transforms needs discussion labels Sep 10, 2019

handle special cases by separate pipelines for images/labels

6b0993b

also fix param passing bug in both Multi and Segmentation versions

Noiredd added 3 commits September 10, 2019 12:22

reuse the original list instead of recreating it

988c54a

lint

6592f71

removed auto-tensor, we can handle ToTensor directly

08ec702

fmassa requested changes Sep 10, 2019

View reviewed changes

cleaner __call__, comment wording

992f4e5

Noiredd mentioned this pull request Sep 12, 2019

SegmentationDataset [feature proposal and demo] #1330

Closed

msminhas93 mentioned this pull request Sep 13, 2019

Added a SegDataset class for the segmentation task #1334

Closed

SebastienEske mentioned this pull request Sep 18, 2019

Add MultiImageFolder dataset #1345

Closed

SebastienEske mentioned this pull request Oct 3, 2019

[RFC] Abstractions for segmentation / detection transforms #1406

Open

vincentqb mentioned this pull request Oct 8, 2019

Shared Dataset Functionality pytorch/pytorch#24915

Open

vmoens mentioned this pull request Aug 10, 2021

WIP Allowing for grouped (random) transformations of tensors/tensor-like objects [proof-of-concept] #4267

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiCompose and SegmentationCompose [proof-of-concept] #1315

MultiCompose and SegmentationCompose [proof-of-concept] #1315

Noiredd commented Sep 9, 2019 •

edited

Loading

fmassa commented Sep 10, 2019

codecov-io commented Sep 10, 2019 •

edited

Loading

fmassa left a comment

fmassa Sep 10, 2019

Noiredd commented Sep 10, 2019

SebastienEske commented Sep 18, 2019

SebastienEske commented Sep 18, 2019 •

edited

Loading

fmassa commented Sep 19, 2019

Noiredd commented Sep 20, 2019 •

edited

Loading

SebastienEske commented Sep 20, 2019 •

edited

Loading

Noiredd commented Sep 20, 2019 •

edited

Loading

SebastienEske commented Sep 20, 2019 •

edited

Loading

Noiredd commented Oct 4, 2019

MultiCompose and SegmentationCompose [proof-of-concept] #1315

Are you sure you want to change the base?

MultiCompose and SegmentationCompose [proof-of-concept] #1315

Conversation

Noiredd commented Sep 9, 2019 • edited Loading

fmassa commented Sep 10, 2019

codecov-io commented Sep 10, 2019 • edited Loading

Codecov Report

fmassa left a comment

Choose a reason for hiding this comment

fmassa Sep 10, 2019

Choose a reason for hiding this comment

Noiredd commented Sep 10, 2019

SebastienEske commented Sep 18, 2019

SebastienEske commented Sep 18, 2019 • edited Loading

fmassa commented Sep 19, 2019

Noiredd commented Sep 20, 2019 • edited Loading

SebastienEske commented Sep 20, 2019 • edited Loading

Noiredd commented Sep 20, 2019 • edited Loading

SebastienEske commented Sep 20, 2019 • edited Loading

Noiredd commented Oct 4, 2019

Noiredd commented Sep 9, 2019 •

edited

Loading

codecov-io commented Sep 10, 2019 •

edited

Loading

SebastienEske commented Sep 18, 2019 •

edited

Loading

Noiredd commented Sep 20, 2019 •

edited

Loading

SebastienEske commented Sep 20, 2019 •

edited

Loading

Noiredd commented Sep 20, 2019 •

edited

Loading

SebastienEske commented Sep 20, 2019 •

edited

Loading