Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCV transforms with tests #34

Closed
wants to merge 2 commits into from
Closed

Conversation

szagoruyko
Copy link
Contributor

Have been training on ImageNet and CIFAR with these, faster and more transparent than PIL, but such a pain to install. A few differences with PIL-based:

  • only np.ndarray on input-output
  • no torch dependency
  • bicubic interpolation by default everywhere

@alykhantejani
Copy link
Contributor

@fmassa @soumith was there a decision on this, did we want cv2 as a dependency?

@szagoruyko
Copy link
Contributor Author

@alykhantejani an option would be to try to import it and fill back to PIL, as for example done in chainercv https://github.com/chainer/chainercv/blob/master/chainercv/transforms/image/resize.py

@soumith
Copy link
Member

soumith commented Sep 5, 2017

making this explicit such as CVResize, CVNormalize etc. is so much better.
Importing and falling back is a bad idea because the resize and other functions are not bitwise compatible between both libraries (CV and PIL). So the same user script might converge to different accuracies depending on what is installed on the system (which is not good).

@fmassa
Copy link
Member

fmassa commented Sep 5, 2017

We already have two possible backends for vision (PIL and accimage), and I'm not sure if both are bitwise compatible either (even though they are only used for loading images at the moment)

I'd rather avoid using OpenCV for loading images (or at least would convert them to BGR), but we could eventually add another backend support in vision, and make use of those backends in the transforms. With a proper refactoring (which is going on in #240), I think it can have minimal code duplication.

What do you think?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Feb 11, 2018

@fmassa @alykhantejani Any updates on integration of opencv as a new backend ? I could work on it if you think it will be integrated one day. For me PIL is rather restrictive when working with multiband images other than uint8. What do you think ?

@hjweide
Copy link

hjweide commented Mar 3, 2018

@vfdev-5 I also want OpenCV as a backend, but I'm not sure how to add that without duplicating functional.py and transforms.py, one for PIL and one for OpenCV. Here is a snippet from my current workaround:

def input_target_transform(image, target, height, width, thickness):
    image = cv2.resize(image, (width, height), interpolation=cv2.INTER_AREA)
    target = cv2.resize(target, (width, height), interpolation=cv2.INTER_AREA)
    # degrees, translations, scale range, shear range, image size
    angle, translate, scale, shear = RandomAffine.get_params(
        (-15, 15), (0.03125, 0.03125), (1. / 1.1, 1.1), (0., 0.),
        (width, height))

    center = (width * 0.5 + 0.5, height * 0.5 + 0.5)
    matrix = F._get_inverse_affine_matrix(
        center, angle, translate, scale, shear)
    M = np.array(matrix).reshape(2, 3)

    image = cv2.warpAffine(
        image, M, (width, height),
        flags=cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR)
    target = cv2.warpAffine(
        target, M, (width, height),
        flags=cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR)

Any ideas on how to integrate OpenCV as a proper backend? I'd be willing to help out.

@fmassa
Copy link
Member

fmassa commented Mar 5, 2018

Hi,

So, one of the things I don't like that much about OpenCV compared to PIL or accimage is that OpenCV returns images in a different format than PIL (BGR instead of RGB), and we'd like for consistency to have all our operations return 0-1 RGB images.

We could patch in some sense ToTensor to take that into account (using the image_backend option), but given that OpenCV returns numpy arrays, I'm not sure how robust such solution would be, given that we already handle numpy arrays in ToTensor, and the lingua-franca that we use for the input/output of most of our transforms is PIL images.

I'm afraid having to backends with different data representations might lead to hard to spot bugs in the user code if we don't do things carefully.
I'm open to suggestions.

@edgarriba
Copy link
Contributor

@fmassa a workaround for this could be wrapping either PIL images and numpy images in a data structure so that the color space can be tracked down. An alternative could also be to add the color space conversion from BGR to RGB at loading time when the opencv backend is used.

@fmassa
Copy link
Member

fmassa commented Mar 5, 2018

@edgarriba yes. that's true.

This would involve a few more changes though, as we don't currently provide an imread functionality in torchvision, and we let the users use whatever they want to open the images. But that could be added.

So this means we would maintain our wrapper class Image, which would hold the data of the image and we could have our backends perform operations in there directly.

This might work, but I'm unsure if there are downsides I'm not seeing now that could pop up down the road.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 5, 2018

@fmassa, as you said, as torchvision does not provide imread functionality, the problem with RGB or BGR can be passed on the user side. In torchvision we can just specify the working format as RGB.

Today, transforms.py contains the most of transformations that works on PIL input, some on Tensors.

If we integrate OpenCV as backend, then all classes transforming images, like Resize should work on any image input: np.ndarray or PIL.Image.Image (accimage.Image)?

We can, probably, create a new file functional_cv2.py with equivalent methods. Or maybe, we can have a single entry with functional.py and backend dependent internal modules functional_pil.py, functional_cv2.py, functional_tensor.py?

What is the purpose of _image_backend?

@fmassa
Copy link
Member

fmassa commented Mar 5, 2018

The purpose of image_backend is to support accimage, which is a faster alternative to PIL but with limited functionality.

I just think that letting to the user the responsability to chose between RGB-BGR might be error-prone.
And we would definitely want to keep the images that we pass in RGB format for compatibility with our modelzoo.

I don't think that np.ndarray (output of opencv) has enough information to represent the images without ambiguity, so as @edgarriba mentioned we could extend it with a colorspace field to at least know if we are in RGB or HSV spaces, for example.

@hjweide
Copy link

hjweide commented Mar 6, 2018

@fmassa In my case, I want to combine existing functions from transforms.py like Resize with functions like distanceTransform from OpenCV. Thus, if the image is wrapped in an Image class, I'd have to unwrap it anyway, which is what I'm trying to avoid by using an OpenCV backend.

Would it be possible to make functional.py completely agnostic of the backend? If so, we could handle all the parameters, checks, and equations there, and then implement two versions of transforms.py, one for OpenCV and one for PIL (like @vfdev-5 suggested). There would be minimal code duplication, because parameters would be handled and the necessary transformations generated in functional.py, and then the actual transformation applied in the transforms.py corresponding to PIL or OpenCV.

Finally, for OpenCV, we could have a cv2ToTensor that converts the numpy array to RGB before converting it to a Tensor to ensure consistency.

@fmassa
Copy link
Member

fmassa commented Mar 7, 2018

@hjweide that sounds reasonable, and might be the way to go.

But the fact that you have a dedicated function for converting from cv2 to Tensor makes your code not agnostic wrt the backend. For example, what if the user by mistake uses ToTensor? This will be a subtle bug, because there will be no error messages, just the color channels will be inverted.

Also, if you want to use opencv functions for transforms, a simple way of doing it is to have a LambdaTransform that converts to numpy, calls the opencv function and back.

Also, one other reason why I am a bit worried about supporting OpenCV torchvision by default is that it doesnt' play well with multiprocessing in fork mode, so only works for Python 3+, see pytorch/pytorch#1838 and opencv/opencv#5150

@edgarriba
Copy link
Contributor

@fmassa I would go for the Image representation model. However, that's true we should find a stable workaround for the multprocessing issue. /cc @alalek @mshabunin any insights ?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 8, 2018

I do not quite understand why Image representation model can save from RGB/BGR error. It still remains on user responsibilty to properly define an instanse of new Image specifying proper colorspace etc if data comes from "somewhere". Similar like F.to_pil_image accepts any np.ndarray (yes, we can hint a mode). Anyway, colorspace problem is relevant to color augmentations and for geometric ones I think it is OK for any colorspace.

@hjweide
Copy link

hjweide commented Mar 9, 2018

@fmassa I did not intend for the user code to be agnostic of the backend, only functional.py. The user would be responsible for composing [cv2Resize, ..., cv2ToTensor]if they choose to use the OpenCV backend. Thanks for the LambdaTransform suggestion.

I understand your concern over opencv/opencv#5150. @edgarriba It's possible to avoid that issue by building OpenCV with "-D BUILD_TBB=ON -D WITH_TBB=ON", but requiring users to build OpenCV from source is probably not an ideal solution.

Perhaps the right thing to do is to develop the OpenCV backend as a small, separate project. It will be clear that it's not officially supported and so the user is responsible for correctly building OpenCV and rearranging the color channels before using the modelzoo. That's effectively what I'm doing at the moment.

@fmassa
Copy link
Member

fmassa commented Mar 14, 2018

@hjweide yeah, I'd rather not to duplicate the number of our transforms to support opencv, ideally they should have the same interface and be easily exchangeable, but as discussed, this might bring a few complications.

@varunagrawal
Copy link
Contributor

Apologies for re-opening this thread, but I think OpenCV as a dependency is a bad idea. TBH I would rather have the user deal with the burden of using OpenCV and making sure it is converted to the right format as ndarray and then pass that to Transforms etc.

Right now, OpenCV is in a bit of a bad state in terms of CUDA support and pytorch and torchision are large enough projects to deal with already without the additional dependency.

A better idea is perhaps to have a note about OpenCV in the README?

@edgarriba
Copy link
Contributor

FYI, there's a proposed project in the GSoC at OpenCV to implement a data augmentation module.
https://github.com/opencv/opencv/wiki/GSoC_2019#idea-computer-vision-data-augmentation-module

Besides, from torchgeometry there's also a plan to cover part of the current torchvision transforms by using pure pytorch operators to avoid using any image backend.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 12, 2019

@varunagrawal there is albumentations which provides transforms using OpenCV as backend similarly to torchvision and which is intended to work and works with PyTorch.

@sisrfeng
Copy link

Any update?

@datumbox
Copy link
Contributor

Closing this PR as it is over 5 years old. Since then TorchVision offers GPU accelerated Torch-based transforms, so adding another backend is not currently in our plans.

@datumbox datumbox closed this Mar 26, 2022
@sisrfeng
Copy link

Closing this PR as it is over 5 years old. Since then TorchVision offers GPU accelerated Torch-based transforms, so adding another backend is not currently in our plans.

You mean this?
image

@datumbox
Copy link
Contributor

Pretty much yes. Since v0.8 TorchVision offers 2 backends. PIL and Tensor. The latter offers GPU acceleration, JIT-scriptability and other goodies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.