-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiCompose and SegmentationCompose [proof-of-concept] #1315
base: main
Are you sure you want to change the base?
Conversation
* added MultiCompose and SegmentationCompose for processing multiple images with the same random transformations * adjusted some of the transforms to allow this for example, do: tf = tv.transforms.SegmentationCompose([ tv.transforms.CenterCrop(400), tv.transforms.RandomHorizontalFlip(), ]) * TODO for full readiness: modify the following transforms: RandomTransforms, RandomCrop, RandomPerspective, RandomResizedCrop, ColorJitter, RandomAffine, RandomGrayscale, RandomErasing
also change its interface - some transforms need to know image shape
also fix param passing bug in both Multi and Segmentation versions
Codecov Report
@@ Coverage Diff @@
## master #1315 +/- ##
==========================================
- Coverage 65.81% 65.24% -0.58%
==========================================
Files 75 75
Lines 5821 5918 +97
Branches 886 906 +20
==========================================
+ Hits 3831 3861 +30
- Misses 1725 1789 +64
- Partials 265 268 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
The new implementation handles some more cases, but I don't think it is general enough. We will end-up making some assumptions which will not hold true for some use-cases, and we will either:
- increase the complexity of the transforms to support that extra use-case
- decide not to support that use-case
The main problem is that I'm not sure we can have a general implementation which is still simple to use and understand for the users.
I've spent quite some time in the past looking into it, and I was not happy with any of the approaches I saw (because of the aforementioned issues).
My current feeling is that the transforms (for anything more complicated than single images) are application-specific (and not even domain-specific like segmentation), and thus should be handled by the application. An example can be found in the references folder for each application (which can be trivially done by the user thanks to the functional interface).
Thoughts?
Assumes that the inputs are tuples of: | ||
(image, label) | ||
or even | ||
(image, image, ..., image, label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is the most common case, it does not handle other (segmentation) use-cases that can appear, where we do not have a straight concept of image
and label
.
Imagine for example that we are predicting a depth map and a segmentation map from a pair of two images.
In this case, we would have two images and two labels, which is currently not supported by this approach.
My goal is not to build the most general data augmentation solution possible, because this has already been accomplished: it's But I don't think that the simple image-to-classes segmentation is an unusual problem. In my personal experience it pops up everywhere, most of the time in this most simple formulation. Right now I keep writing (copy-pasting) the same code over and over to handle the exact same hassle every time (in ensuring random parameters are the same between images and labels) and avoid the exact same pitfalls (transforms that should not be directly applied to labels), and it happens very often (and not just to me). It is my honest opinion that segmentation deserves a built-in, dedicated solution instead of having the users reinvent the same wheel. This PR doesn't offer anything more general because that's exactly how I envisioned it: solving one highly specific problem. However, it does make life considerably easier even for those with more unusual tasks. Introducing uniform interface for parameter generation ( |
Hello there, I made this pull request which I believe could be simple and generic enough. |
Regarding the issue with synchronizing random transforms across several uses, could we maybe add an optional seed parameter to each random transform? If they use the same seed for their random number generator, that could work, couldn't it? |
This is a solution similar to what was proposed in #9 (comment). The discussion in there is very interesting, and there are a few good alternatives. We are discussing about potential ways of improving the transforms and making it more generic cc @cpuhrsch @zhangguanheng66 @vincentqb for thoughts |
I agree with @colesbury (#9 (comment)), sharing the RNG seed is a hack. But let's look at the root of the issue this approach is trying to solve: we have multiple independent* transforms but we need them to share something. Why not just give that something a meaningful identity? For example as a concept of transformation parameters, which is IMHO the most naturally occurring idea. Your idea from #9 (comment) is kind of that, except I can't see why would we need 2 classes for each transform. The way it's implemented right now, where each class represents a transformation and its parameter generation, is way more convenient - except for the affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
params = affine_augment.get_params(
degrees=affine_augment.degrees,
translate=affine_augment.translate,
scale_ranges=affine_augment.scale,
shears=affine_augment.shear,
img_size=image.size
) # why static? had it been an object method I wouldn't have to retrieve all those attributes just to feed them back to it o_O
image = F.affine(image, *params)
label = F.affine(label, *params) instead of just this: affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
params = affine_augment.get_params(image) # the object has everything it needs!
image = affine_augment(image, params)
label = affine_augment(label, params) There are probably other, maybe better ways to accomplish goals that * - Transforms have to be independent IMHO: if we rotate an image, we might want to interpolate it; but for its segmentation mask we cannot do this; and for object bounding boxes we need a completely different function. Etc. |
Hello, I would really think that if each transform had its own RNG object, we could provide it with a seed and that would solve the problem. The comment in #9 use a hack in the training loop. What I am suggesting is more along the lines of class myrandomtransform():
def __init__(self, seed=None):
self.rng = CMWC() #or any other RNG object
if seed!=None:
self.rng.seed(seed)
def run(self):
return self.rng.random()
a = mytrans(2)
b = mytrans(2)
image = a.apply(image)
label = b.apply(label) Here is a more complete code example |
This does not convince me. The solution with parameters is much more solid and at the same time gives you more freedom. You have one object that you can call any number of times, whereas in the RNG approach one needs to construct as many objects as there are inputs. Worse, those objects are only implicitly connected - one has to make sure that each of them is called exactly the same number of times. And what if you wanted to do something less standard, like freeze transform parameters temporarily, or not use a transform sometimes (maybe your number of inputs is variable)? RNG solution fails miserably whenever you try to do anything that's not the standard use-case. This is exactly the level of fragility you don't want to have in a framework. Explicit is better than implicit: direct control over the params is better than hoping that the implicit connection (via seed) will hold. |
Ok, I see your point. That seems to be more in the spirit of pytorch, yes. One caveat is that there is still no way to synchronize two random transforms that are passed to a dataset. That was the point of the seed option which actually is very convenient to use when you only need the transforms for the dataset. Maybe we can implement both? With adequate user warning in the documentation? This already happens for CUDA and for distributed Another way would be to have a affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
affine_augment_sync = tv.transforms.RandomAffine(master=affine_augment)
image = affine_augment(image)
label= affine_augment_sync(label) And inside the def __apply__(input, params=None):
if self.master is not None and params is None:
params = master.get_params()
elif params is None:
params = self.update_params()
return apply_params(input, params) That way we have a strong synchronization. And you can still use affine_augment = tv.transforms.RandomAffine(degrees=90, translate=(-50, 50), scale=(0.5, 2))
# update params and apply transform
image = affine_augment(image)
#get current params
params = affine_augment.get_params()
#do not update params, apply the provided ones instead
label= affine_augment_sync(label, params) That would solve both use cases right? |
It's been beautifully written in The Zen of Python and I think it applies here: "There should be one-- and preferably only one --obvious way to do it". Synchronising via RNG seed is a bad idea - everything that it offers can be easily done with my parameter-based proposal, and then some. Implementing both means that we now have one way to do it which is error-prone and generally broken, and the other, right one. I answered you about the master-follower idea under #1406 (comment). |
Follow up to #1169 (see that and comments for details).
I added
MultiCompose
andSegmentationCompose
classes to allow processing multiple images with the same random transformations.SegmentationCompose
is slightly more intelligent - it knows which transforms to execute on the label and which would destroy it (e.g.ColorJitter
).Most of the transforms have already been adjusted in accordance with step 1 described in #1169. Only
RandomTransforms
need some attention (might be tricky to rework).I tested the functionality using the following: