Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current way to use torchvision.prototype.transforms #7168

Closed
austinmw opened this issue Feb 2, 2023 · 2 comments
Closed

Current way to use torchvision.prototype.transforms #7168

austinmw opened this issue Feb 2, 2023 · 2 comments

Comments

@austinmw
Copy link

austinmw commented Feb 2, 2023

📚 The doc issue

I tried to run the end-to-end example in this recent blog post:

import PIL
from torchvision import io, utils
from torchvision.prototype import features, transforms as T
from torchvision.prototype.transforms import functional as F
# Defining and wrapping input to appropriate Tensor Subclasses
path = "COCO_val2014_000000418825.jpg"
img = features.Image(io.read_image(path), color_space=features.ColorSpace.RGB)
# img = PIL.Image.open(path)
bboxes = features.BoundingBox(
    [[2, 0, 206, 253], [396, 92, 479, 241], [328, 253, 417, 332],
     [148, 68, 256, 182], [93, 158, 170, 260], [432, 0, 438, 26],
     [422, 0, 480, 25], [419, 39, 424, 52], [448, 37, 456, 62],
     [435, 43, 437, 50], [461, 36, 469, 63], [461, 75, 469, 94],
     [469, 36, 480, 64], [440, 37, 446, 56], [398, 233, 480, 304],
     [452, 39, 463, 63], [424, 38, 429, 50]],
    format=features.BoundingBoxFormat.XYXY,
    spatial_size=F.get_spatial_size(img),
)
labels = features.Label([59, 58, 50, 64, 76, 74, 74, 74, 74, 74, 74, 74, 74, 74, 50, 74, 74])
# Defining and applying Transforms V2
trans = T.Compose(
    [
        T.ColorJitter(contrast=0.5),
        T.RandomRotation(30),
        T.CenterCrop(480),
    ]
)
img, bboxes, labels = trans(img, bboxes, labels)
# Visualizing results
viz = utils.draw_bounding_boxes(F.to_image_tensor(img), boxes=bboxes)
F.to_pil_image(viz).show()

but found that torchvision.prototype.features is now gone. What's the current way to run this? I attempted to simply pass the images, bboxes and labels with the following types: torchvision.prototype.datasets.utils._encoded.EncodedImage, torchvision.prototype.datapoints._bounding_box.BoundingBox, torchvision.prototype.datapoints._label.Label. However this didn't seem to apply the transforms as everything remained the same shape.

edit: I've found that features seems to be renamed to datapoints. I tried applying this, but EncodedImage in a coco sample['image'] seems to be 1D and prototype.transforms requires 2D images. What's the proper way to get this as 2D so I can apply transforms? Is there a decode method I'm missing?

Suggest a potential alternative/fix

No response

cc @vfdev-5 @bjuncek @pmeier

@pmeier
Copy link
Collaborator

pmeier commented Feb 2, 2023

I've found that features seems to be renamed to datapoints.

Correct. That happened in #7002.

I tried applying this, but EncodedImage in a coco sample['image'] seems to be 1D and prototype.transforms requires 2D images.

All development on torchvision.prototype.datasets is on hold and thus there might be some incompatibilities. You can find our proposal on how to use the datasets v1 with the transforms v2 in #6662. We have a PoC implementation in #6663 that I'm actively working on. Happy to get your feedback there regarding this link between the two.

What's the proper way to get this as 2D so I can apply transforms? Is there a decode method I'm missing?

Our idea was for the prototype datasets to just return the raw bytes so decoding can happen however the user likes. In #6944 we made a cut and separated datasets from transforms to focus on the latter. In that PR we also removed the decoding transforms that linked the two. Here is the relevant part from the state right before the PR was merged:

@torch.jit.unused
def decode_image_with_pil(encoded_image: torch.Tensor) -> features.Image:
image = torch.as_tensor(np.array(PIL.Image.open(ReadOnlyTensorBuffer(encoded_image)), copy=True))
if image.ndim == 2:
image = image.unsqueeze(2)
return features.Image(image.permute(2, 0, 1))

class DecodeImage(Transform):
_transformed_types = (features.EncodedImage,)
def _transform(self, inpt: torch.Tensor, params: Dict[str, Any]) -> features.Image:
return F.decode_image_with_pil(inpt) # type: ignore[no-any-return]


Substituting datapoints for features in your and removing the color_space parameter from the datapoints.Image instantiation (this happened in #7120) should be sufficient to get the example working again. As such I'm closing this. If you have general questions or feedback, the thread in #6753 might be also be of interest.

@austinmw
Copy link
Author

austinmw commented Feb 2, 2023

Thanks, appreciate your response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants