Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiCompose and SegmentationCompose [proof-of-concept] #1315

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Noiredd
Copy link

@Noiredd Noiredd commented Sep 9, 2019

Follow up to #1169 (see that and comments for details).

I added MultiCompose and SegmentationCompose classes to allow processing multiple images with the same random transformations. SegmentationCompose is slightly more intelligent - it knows which transforms to execute on the label and which would destroy it (e.g. ColorJitter).

Most of the transforms have already been adjusted in accordance with step 1 described in #1169. Only RandomTransforms need some attention (might be tricky to rework).

I tested the functionality using the following:

import torch
import torchvision as tv

def seg_loader(paths):
    img_path, lab_path = paths
    with open(img_path, 'rb') as f:
        img = Image.open(f).convert('RGB')
    with open(lab_path, 'rb') as f:
        lab = Image.open(f).convert('RGB')
    return img, lab

class SegmentationFolder(tv.datasets.VisionDataset):
    def __init__(self, root, transforms=None, loader=seg_loader):
        super(SegmentationFolder, self).__init__(root, transforms=transforms)
        self.root_dir = root
        self.loader = loader
        self.images = [
            'src/images/12234.png',
            'src/images/20703.png',
            'src/images/33303.png',
            'src/images/40134.png',
            'src/images/57535.png',
        ]
        self.labels = [
            'src/labels/12234.png',
            'src/labels/20703.png',
            'src/labels/33303.png',
            'src/labels/40134.png',
            'src/labels/57535.png',
        ]

    def __getitem__(self, index):
        paths = (self.images[index], self.labels[index])
        return self.transforms(self.loader(paths))

    def __len__(self):
        return len(self.labels)

tfm = tv.transforms.SegmentationCompose([
    tv.transforms.RandomCrop(400),
    tv.transforms.RandomHorizontalFlip(),
    tv.transforms.ColorJitter(brightness=3),
    tv.transforms.RandomPerspective(),
    tv.transforms.ToTensor(),
    tv.transforms.Normalize((0, 0, 0), (1,1,1)),
])
src = SegmentationFolder(root='', transforms=tfm)
ldr = torch.utils.data.DataLoader(src, batch_size=1)
for b in ldr:
    print(b[0].shape, b[1].shape, b[1].numpy().max())

* added MultiCompose and SegmentationCompose for processing multiple
  images with the same random transformations
* adjusted some of the transforms to allow this
  for example, do:
    tf = tv.transforms.SegmentationCompose([
        tv.transforms.CenterCrop(400),
        tv.transforms.RandomHorizontalFlip(),
    ])
* TODO for full readiness: modify the following transforms:
  RandomTransforms, RandomCrop, RandomPerspective, RandomResizedCrop,
  ColorJitter, RandomAffine, RandomGrayscale, RandomErasing
also change its interface - some transforms need to know image shape
@fmassa
Copy link
Member

fmassa commented Sep 10, 2019

Hi @Noiredd , thanks for the PR!

I put some of my thoughts in #1169 , let me know what you think

also fix param passing bug in both Multi and Segmentation versions
@codecov-io
Copy link

codecov-io commented Sep 10, 2019

Codecov Report

Merging #1315 into master will decrease coverage by 0.57%.
The diff coverage is 40.86%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1315      +/-   ##
==========================================
- Coverage   65.81%   65.24%   -0.58%     
==========================================
  Files          75       75              
  Lines        5821     5918      +97     
  Branches      886      906      +20     
==========================================
+ Hits         3831     3861      +30     
- Misses       1725     1789      +64     
- Partials      265      268       +3
Impacted Files Coverage Δ
torchvision/transforms/transforms.py 72.8% <40.86%> (-8.15%) ⬇️
torchvision/ops/roi_pool.py 70.21% <0%> (ø) ⬆️
torchvision/datasets/mnist.py 51.7% <0%> (ø) ⬆️
torchvision/ops/roi_align.py 68% <0%> (ø) ⬆️
torchvision/ops/boxes.py 94.73% <0%> (+0.14%) ⬆️
torchvision/utils.py 66.03% <0%> (+0.65%) ⬆️
torchvision/extension.py 40.9% <0%> (+2.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4d5003...992f4e5. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

The new implementation handles some more cases, but I don't think it is general enough. We will end-up making some assumptions which will not hold true for some use-cases, and we will either:

  • increase the complexity of the transforms to support that extra use-case
  • decide not to support that use-case

The main problem is that I'm not sure we can have a general implementation which is still simple to use and understand for the users.
I've spent quite some time in the past looking into it, and I was not happy with any of the approaches I saw (because of the aforementioned issues).

My current feeling is that the transforms (for anything more complicated than single images) are application-specific (and not even domain-specific like segmentation), and thus should be handled by the application. An example can be found in the references folder for each application (which can be trivially done by the user thanks to the functional interface).

Thoughts?

Assumes that the inputs are tuples of:
(image, label)
or even
(image, image, ..., image, label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is the most common case, it does not handle other (segmentation) use-cases that can appear, where we do not have a straight concept of image and label.
Imagine for example that we are predicting a depth map and a segmentation map from a pair of two images.
In this case, we would have two images and two labels, which is currently not supported by this approach.

@Noiredd
Copy link
Author

Noiredd commented Sep 10, 2019

My goal is not to build the most general data augmentation solution possible, because this has already been accomplished: it's functional.py. For anyone with some unusual problem this is the way to do that, I agree with you that PyTorch shouldn't bloat with application-specific solutions.

But I don't think that the simple image-to-classes segmentation is an unusual problem. In my personal experience it pops up everywhere, most of the time in this most simple formulation. Right now I keep writing (copy-pasting) the same code over and over to handle the exact same hassle every time (in ensuring random parameters are the same between images and labels) and avoid the exact same pitfalls (transforms that should not be directly applied to labels), and it happens very often (and not just to me). It is my honest opinion that segmentation deserves a built-in, dedicated solution instead of having the users reinvent the same wheel.

This PR doesn't offer anything more general because that's exactly how I envisioned it: solving one highly specific problem. However, it does make life considerably easier even for those with more unusual tasks. Introducing uniform interface for parameter generation (generate_params) means that anyone can write their own Compose now, using MultiCompose or SegmentationCompose as a reference for their particular case.

@SebastienEske
Copy link

Hello there, I made this pull request which I believe could be simple and generic enough.

@SebastienEske
Copy link

SebastienEske commented Sep 18, 2019

Regarding the issue with synchronizing random transforms across several uses, could we maybe add an optional seed parameter to each random transform? If they use the same seed for their random number generator, that could work, couldn't it?

@fmassa
Copy link
Member

fmassa commented Sep 19, 2019

@SebastienEske

Regarding the issue with synchronizing random transforms across several uses, could we maybe add an optional seed parameter to each random transform? If they use the same seed for their random number generator, that could work, couldn't it?

This is a solution similar to what was proposed in #9 (comment). The discussion in there is very interesting, and there are a few good alternatives. We are discussing about potential ways of improving the transforms and making it more generic

cc @cpuhrsch @zhangguanheng66 @vincentqb for thoughts

@Noiredd
Copy link
Author

Noiredd commented Sep 20, 2019

I agree with @colesbury (#9 (comment)), sharing the RNG seed is a hack. But let's look at the root of the issue this approach is trying to solve: we have multiple independent* transforms but we need them to share something. Why not just give that something a meaningful identity? For example as a concept of transformation parameters, which is IMHO the most naturally occurring idea. Your idea from #9 (comment) is kind of that, except I can't see why would we need 2 classes for each transform. The way it's implemented right now, where each class represents a transformation and its parameter generation, is way more convenient - except for the get_params method having been made static.
What benefit does it offer - that I must confess eludes me. In my view, the only two things that are currently standing in the way of applying transforms over multiple images are: (1) that method, and (2) ability to actually use the generated parameters. Had get_params had a common interface (like generate_params in my proposal), and had __call__ accepted those params as input, anyone could write their own custom Compose in a breeze. Sorry if I'm being overly critical of the current system but I can't help but feel like bending over backwards having to do this:

affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
params = affine_augment.get_params(
    degrees=affine_augment.degrees,
    translate=affine_augment.translate,
    scale_ranges=affine_augment.scale,
    shears=affine_augment.shear,
    img_size=image.size
) # why static? had it been an object method I wouldn't have to retrieve all those attributes just to feed them back to it o_O
image = F.affine(image, *params)
label = F.affine(label, *params)

instead of just this:

affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
params = affine_augment.get_params(image) # the object has everything it needs!
image = affine_augment(image, params)
label = affine_augment(label, params)

There are probably other, maybe better ways to accomplish goals that SegmentationCompose and SegmentationDataset attempt. But no matter what those are, I'm convinced that changes to transforms are necessary anyway. It's either that or defaulting to functional interface, which I will personally always advocate against (in this context) - there's too much valuable infrastructure in get_params already.

* - Transforms have to be independent IMHO: if we rotate an image, we might want to interpolate it; but for its segmentation mask we cannot do this; and for object bounding boxes we need a completely different function. Etc.

@SebastienEske
Copy link

SebastienEske commented Sep 20, 2019

Hello, I would really think that if each transform had its own RNG object, we could provide it with a seed and that would solve the problem. The comment in #9 use a hack in the training loop. What I am suggesting is more along the lines of

class myrandomtransform():
    def __init__(self, seed=None):
        self.rng = CMWC() #or any other RNG object
        if seed!=None:
            self.rng.seed(seed)
    def run(self):
        return self.rng.random()
a = mytrans(2)
b = mytrans(2)
image = a.apply(image)
label = b.apply(label)

Here is a more complete code example
And this is a fully backward compatible change since we only add one optional parameter when creating the transform.

@Noiredd
Copy link
Author

Noiredd commented Sep 20, 2019

This does not convince me. The solution with parameters is much more solid and at the same time gives you more freedom. You have one object that you can call any number of times, whereas in the RNG approach one needs to construct as many objects as there are inputs. Worse, those objects are only implicitly connected - one has to make sure that each of them is called exactly the same number of times. And what if you wanted to do something less standard, like freeze transform parameters temporarily, or not use a transform sometimes (maybe your number of inputs is variable)? RNG solution fails miserably whenever you try to do anything that's not the standard use-case. This is exactly the level of fragility you don't want to have in a framework.

Explicit is better than implicit: direct control over the params is better than hoping that the implicit connection (via seed) will hold.

@SebastienEske
Copy link

SebastienEske commented Sep 20, 2019

And what if you wanted to do something less standard, like freeze transform parameters temporarily, or not use a transform sometimes (maybe your number of inputs is variable)?

Ok, I see your point. That seems to be more in the spirit of pytorch, yes.

One caveat is that there is still no way to synchronize two random transforms that are passed to a dataset. That was the point of the seed option which actually is very convenient to use when you only need the transforms for the dataset. Maybe we can implement both? With adequate user warning in the documentation? This already happens for CUDA and for distributed

Another way would be to have a master argument when creating a random transform that would indicate the transform object to reuse the state from the parent.

affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
affine_augment_sync = tv.transforms.RandomAffine(master=affine_augment)
image = affine_augment(image)
label= affine_augment_sync(label)

And inside the __call__ function we could have something like this:

def __apply__(input, params=None):
    if self.master is not None and params is None:
        params = master.get_params()
    elif params is None:
        params = self.update_params()
    return apply_params(input, params)

That way we have a strong synchronization. And you can still use get_params as intended with a slightly different call order if you want to have something really customized in the training loop:

affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
# update params and apply transform
image = affine_augment(image)
#get current params
params = affine_augment.get_params()
#do not update params, apply the provided ones instead
label= affine_augment_sync(label, params)

That would solve both use cases right?

@Noiredd
Copy link
Author

Noiredd commented Oct 4, 2019

It's been beautifully written in The Zen of Python and I think it applies here: "There should be one-- and preferably only one --obvious way to do it". Synchronising via RNG seed is a bad idea - everything that it offers can be easily done with my parameter-based proposal, and then some. Implementing both means that we now have one way to do it which is error-prone and generally broken, and the other, right one.

I answered you about the master-follower idea under #1406 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants