[WIP] Add Mosaic augmentation #1147

i-aki-y · 2022-03-18T11:32:30Z

About PR

In this PR, I implemented a mosaic augmentation used in YOLO[1, 2].

I appreciate any comment and suggetsion.

[1]: "YOLOv 4 : Optimal speed and accuracy of object detection.", https://arxiv.org/pdf/2004.10934.pdf
[2]: YOLOv5 https://github.com/ultralytics/yolov5

Demo

This is a reproducable example:

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import skimage
import albumentations as A

## define helper funcs
def add_bbox(ax, bbox, encoder):
    label = 0    
    if len(bbox) > 4:
        bbox, label = bbox[:4], bbox[4]
        label = encoder[label]
    bbox_color = plt.get_cmap("tab10").colors[label]
    x_min, y_min, x_max, y_max = bbox
    w, h = x_max - x_min, y_max - y_min
    pat = Rectangle(xy=(x_min, y_min), width=w, height=h, fill=False, lw=3, color=bbox_color)
    ax.add_patch(pat)

def plot_image_and_bboxes(image, bboxes, encoder, ax):
    ax.imshow(image)
    for i in range(len(bboxes)):
        add_bbox(ax, bboxes[i], encoder)


## data setup
encoder = {"face": 0, "rocket": 1, "other": 2}
image_list = [skimage.data.astronaut(), skimage.data.cat(), skimage.data.coffee(), skimage.data.rocket()]
bboxes_list = [
    [[170, 30, 280, 180, "face"], [350, 80, 460, 290, "rocket"], [140, 350, 200, 420, "other"]],
    [[50, 0, 350, 280, "face"]],
    [[160, 15, 420, 210, "other"]],
    [[300, 120, 340, 420, "rocket"]],
]

## define pipeline
bbox_format = 'pascal_voc'
transform = A.Compose([
    A.Mosaic(height=2*512, width=2*512, shift_limit_x=0.0, shift_limit_y=0.0, replace=False, p=1.0, fill_value=114, bboxes_format=bbox_format),
    A.RandomResizedCrop(height=512, width=512, scale=(0.4, 1.0)),
], bbox_params=A.BboxParams(format=bbox_format))


## show input images
fig, axes = plt.subplots(2, 2, figsize=(6, 6))
axes = axes.flatten()

for i in range(len(image_list)):
    ax = axes[i]
    ax.set_title(f"input{i}")
    image = image_list[i]
    bboxes = bboxes_list[i]
    plot_image_and_bboxes(image, bboxes, encoder, ax)

plt.show()   
#plt.savefig("mosaic_input.jpg", bbox_inches='tight')

fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.flatten()
for ax in axes:
    data = transform(image=image_list[0], image_cache=image_list[1:], bboxes=bboxes_list[0], bboxes_cache=bboxes_list[1:])
    image = data["image"]
    bboxes = data["bboxes"]    
    plot_image_and_bboxes(image, bboxes, encoder, ax)

plt.show()       
#plt.savefig("mosaic_output.jpg", bbox_inches='tight')

Input

Some Results

Notes

Since current albumentations do not support multiple image sources, I introduced helper targets, image_cache, bboxes_cache, as additional data sources. The user needs to set additional images and bboxes these helper targets. So, it is up to users to decide how to prepare and manage multiple image data. This means that users can set all images to the image_cache when the user has sufficient memory, or the dataset is small. On the other hand, the user can read a small number of images for each iteration.

Note that this PR version does not support the labels_cache target. This means that the user should embed the label information inside the bounding boxes like [xmin, ymin, xmax, ymax, label] (when bboxes_format=pascal_voc).

Another limitation is that the Mosaic augmentation should be placed at the first transform of the pipeline like the above example. Because the transforms placed before the mosaic would be applied only to the image set to image target while the additional images set to image_cache are just ignored. This means images set to image and image_cache will have different augmentation histories. I think this is not what users expect. For example, with the following pipeline, Normalize and RandomResizedCrop will be applied only to the image, not any image_cache.

transform = A.Compose([
    A.Normalize(...),
    A.RandomResizedCrop(...),
    A.Mosaic(...),
    ...
])
data = transform(image=image, image_cache=image_cache)

I think this is not a serious limitation because users can prepare two pipelines and apply them separately if needed.

preprocess = A.Compose([    
    A.Normalize(...),
    A.RandomResizedCrop(...),
])

transform = A.Compose([
    A.Mosaic(...),
])

batch = [preprocess(image=image_batch[i], bboxes=bboxes_batch[i]) for i in range(n)]
image_batch = [data["image"] for data in batch]
bboxes_batch = [data["bboxes"] for data in batch]
data = transform(image=image, image_cache=image_cache, ...)

The same strategy can be applied for other multiple image augmentation like MixUp.
For example, I think a similar augmentation used in YOLO5 will be given in the following way.

mosaic = A.Compose([
    A.Mosaic(...),
    A.Affine(...),
    A.RandomResizedCrop(...),
])
mixup = A.Compose([
    A.MixUp(...),  # not included in this PR
])

mosaic1 = mosaic_aug(image=image1, image_cache=image_cache, bboxes=bboxes1, bboxes_cache=bboxes_cache)
mosaic2 = mosaic_aug(image=image2, image_cache=image_cache, bboxes=bboxes2, bboxes_cache=bboxes_cache)
mosaic_mixup = mixup_aug(image=mosaic1["image"], bboxes=mosaic1["bboxes"], image_cache=mosaic2["image"], bboxes_cache=mosaic2["bboxes"])

Implementation Notes

The `target_dependence` property is used

I used target_dependence property to pass the helper targets to the apply_xxx functions instead of get_params_dependent_on_targets.

This is because the returned values of get_parameters and get_params_dependent_on_targets will become the targets of serialization, which is a mechanism used for "replay". Since I think that these helper targets are not appropriate for serialization, I used the target_dependence property mechanism instead.

Mosaic center is fixed

YOLO5 implementation includes randomization of the mosaic center position.

https://github.com/ultralytics/yolov5/blob/7c6a33564a84a0e78ec19da66ea6016d51c32e0a/utils/datasets.py#L653

I excluded this feature from the PR version because the same effect can be obtained by applying RandomResizedCrop just after the Mosaic as the above demo example.

TODO

implement apply_to_keypoints
write tests
bboxes_cache preprocessing should be done in Compose if possible.

mikel-brostrom · 2023-01-22T11:51:59Z

Any updates on this?

mikel-brostrom · 2023-02-20T17:30:17Z

I tried this out together with some rotation augmentations and seems to work @i-aki-y .

However, from time to time this error arise:

ValueError: y_max is less than or equal to y_min for bbox

when using COCO. Any idea how to fix this @i-aki-y?

Moreover, the structure of the repo must have changed since the PR was created as some refactoring was needed.

mikel-brostrom · 2023-02-20T17:52:21Z

Since current albumentations do not support multiple image sources, I introduced helper targets, image_cache, bboxes_cache, as additional data sources

Couldn't this be solved by using additional_targets like here?
This would allow the loaded images to be augmented in different ways before mosaic 🚀

mikel-brostrom · 2023-02-21T16:45:27Z

Setting the width and height for Mosaic requires some basic knowledge regarding the dataset your are working with as most part of the image could be left outside otherwise. Maybe this should be reflected in the docstrings as well. I guess that a good set of initial values could be the average width and height of the dataset you work with @i-aki-y ?

i-aki-y · 2023-02-22T07:10:39Z

@mikel-brostrom Sorry for delaying and thank you for your feedback!

ValueError: y_max is less than or equal to y_min for bbox

This means that some bboxes have zero or minus heights.
Did you get this error only when you used mosaic transform?

Moreover, the structure of the repo must have changed since the PR was created as some refactoring was needed.

OK. I will check and update the PR.

Couldn't this be solved by using additional_targets like here?
This would allow the loaded images to be augmented in different ways before mosaic 🚀

I missed this feature! I will look into this if I can use this for this transform.

mikel-brostrom · 2023-02-22T07:28:56Z

I have it working locally so I could create pull request with what I have @i-aki-y . I also have MixUP working as you suggested:

mosaic1 = mosaic_aug(image=image1, image_cache=image_cache, bboxes=bboxes1, bboxes_cache=bboxes_cache)
mosaic2 = mosaic_aug(image=image2, image_cache=image_cache, bboxes=bboxes2, bboxes_cache=bboxes_cache)
mosaic_mixup = mixup_aug(image=mosaic1["image"], bboxes=mosaic1["bboxes"], image_cache=mosaic2["image"], bboxes_cache=mosaic2["bboxes"])

Maybe this should go into a separate PR?

mikel-brostrom · 2023-02-22T07:31:43Z

This means that some bboxes have zero or minus heights.
Did you get this error only when you used mosaic transform?

Yes. I had multiple augmentations in my augmentation stack but it always appeared during Mosiac. The case was that when the error emerged y_max was always equal to y_min. This couldn't be fixed by setting min_area in A.BboxParams btw...

i-aki-y · 2023-02-22T08:23:54Z

@mikel-brostrom

Yes. I had multiple augmentations in my augmentation stack but it always appeared during Mosiac. The case was that when the error emerged y_max was always equal to y_min. This couldn't be fixed by setting min_area in A.BboxParams btw...

Thanks, I will investigate it.

I have it working locally so I could create pull request with what I have @i-aki-y . I also have MixUP working as you suggested:

Great!

Maybe this should go into a separate PR?

Yes.

mikel-brostrom · 2023-02-23T08:58:15Z

Feel free to check out my MixUp implementation here @i-aki-y . Any feedback is appreciated. It works nicely with this Mosaic implementation 😄. I am going for CutMix 🚀

mikel-brostrom · 2023-02-24T08:48:33Z

Btw, I have implemented this in a slightly different manner.

class Mosaic(DualTransform):
    def __init__(
        self,
        height,
        width,
        replace=True,
        fill_value=0,
        bboxes_format="coco",
        always_apply=False,
        p=0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.height = height
        self.width = width
        self.replace = replace
        self.fill_value = fill_value
        self.bboxes_format = bboxes_format
        self.images = []
        self.bboxes = []

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("height", "width", "replace", "fill_value", "bboxes_format")

    def apply(self, image, **params):
        return mosaic4(self.images, self.height, self.width, self.fill_value)

    def apply_to_keypoint(self, **params):
        pass  # TODO
    
    def apply_to_bbox(self, bbox, image_shape, position, height, width, **params):
        rows, cols = image_shape[:2]
        return bbox_mosaic4(bbox, rows, cols, position, height, width)
    
    def apply_to_bboxes(self, bboxes, **params):
        new_bboxes = []
        for i, (bbox, im) in enumerate(zip(self.bboxes, self.images)):
            im_shape = im.shape
            h, w, _ = im_shape
            for b in bbox:
                new_bbox = self.apply_to_bbox(b, im_shape, i, self.height, self.width)
                new_bboxes.append(new_bbox)
        return new_bboxes

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        self.images = [params['image'], params['image1'], params['image2'], params['image3']]
        self.bboxes = [params['bboxes'], params['bboxes1'], params['bboxes2'], params['bboxes3']]
        images_bboxes = list(zip(self.images, self.bboxes))
        random.shuffle(images_bboxes)
        self.images, self.bboxes = zip(*images_bboxes)
        return {}
        
    @property
    def targets_as_params(self) -> List[str]:
        return [
            "image", "image1", "image2", "image3",
            "bboxes", "bboxes1", "bboxes2", "bboxes3"
        ]

Trying to follow the recommended way of working with multiple images and bboxes. I however see that apply and apply_to_bbox is called equally many times as there are targets. Any ideas on how to circumvent this @i-aki-y ?

i-aki-y · 2023-02-27T04:39:49Z

@mikel-brostrom

ValueError: y_max is less than or equal to y_min for bbox

I found some coco annotations have bboxes with height == 0.0.
This is the cause of the error.

import json
import pathlib
coco_annot_path = pathlib.Path("coco/annotations/instances_train2017.json")
with open(coco_annot_path) as f:
    coco_annots = json.load(f)
for item in coco_annots["annotations"]:
    x, y, w, h = item["bbox"]
    if w == 0 or h == 0:
        print(item)

> {'segmentation': [[296.65, 388.33, 296.65, 388.33, 297.68, 388.33, 297.68, 388.33]], 'area': 0.0, 'iscrowd': 0, 'image_id': 200365, 'bbox': [296.65, 388.33, 1.03, 0.0], 'category_id': 58, 'id': 918}
> {'segmentation': [[9.98, 188.56, 15.52, 188.56, 15.52, 188.56, 11.09, 188.56]], 'area': 0.0, 'iscrowd': 0, 'image_id': 550395, 'bbox': [9.98, 188.56, 5.54, 0.0], 'category_id': 1, 'id': 2206849}

mikel-brostrom · 2023-02-27T06:59:38Z

But this should be avoided by bbox_params=A.BboxParams(format='coco', min_area=1) right @i-aki-y? I tried this but didn't work for me

i-aki-y · 2023-02-27T09:27:07Z

@mikel-brostrom No, the filters are applied in the post-processing, while the error occurs in pre-processing validation.

I think they have different purposes.
The filters are necessary because some transforms make bbox with zero or tiny areas by design.
But invalid data in the input suggest something was wrong in the previous process, which should be fixed.

Add 2x2 Mosaic augmentation

886e3e8

i-aki-y mentioned this pull request Mar 18, 2022

CutMix and Mosaic Augmentation #677

Open

Dipet added the WIP label Jun 11, 2022

ternaus added the enhancement New feature or request label Jul 6, 2022

chAwater mentioned this pull request Jul 20, 2022

Mixup or CutMix augmentations #340

Open

i-aki-y added 2 commits February 27, 2023 13:46

Merge branch 'master' into add-mosaic-aug

c1b0a4c

Remove unused type imports

4cf2dd3

mikel-brostrom mentioned this pull request Mar 2, 2023

Added MixUp augmentation #1409

Closed

i-aki-y mentioned this pull request Mar 8, 2023

A proposal of a framework to handle multi-image augmentation (Including mosaic augmentation) #1420

Closed

ternaus closed this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add Mosaic augmentation #1147

[WIP] Add Mosaic augmentation #1147

i-aki-y commented Mar 18, 2022 •

edited

Loading

mikel-brostrom commented Jan 22, 2023

mikel-brostrom commented Feb 20, 2023 •

edited

Loading

mikel-brostrom commented Feb 20, 2023 •

edited

Loading

mikel-brostrom commented Feb 21, 2023

i-aki-y commented Feb 22, 2023

mikel-brostrom commented Feb 22, 2023

mikel-brostrom commented Feb 22, 2023 •

edited

Loading

i-aki-y commented Feb 22, 2023

mikel-brostrom commented Feb 23, 2023 •

edited

Loading

mikel-brostrom commented Feb 24, 2023

i-aki-y commented Feb 27, 2023

mikel-brostrom commented Feb 27, 2023 •

edited

Loading

i-aki-y commented Feb 27, 2023

[WIP] Add Mosaic augmentation #1147

[WIP] Add Mosaic augmentation #1147

Conversation

i-aki-y commented Mar 18, 2022 • edited Loading

About PR

Demo

Input

Some Results

Notes

Implementation Notes

The target_dependence property is used

Mosaic center is fixed

TODO

mikel-brostrom commented Jan 22, 2023

mikel-brostrom commented Feb 20, 2023 • edited Loading

mikel-brostrom commented Feb 20, 2023 • edited Loading

mikel-brostrom commented Feb 21, 2023

i-aki-y commented Feb 22, 2023

mikel-brostrom commented Feb 22, 2023

mikel-brostrom commented Feb 22, 2023 • edited Loading

i-aki-y commented Feb 22, 2023

mikel-brostrom commented Feb 23, 2023 • edited Loading

mikel-brostrom commented Feb 24, 2023

i-aki-y commented Feb 27, 2023

mikel-brostrom commented Feb 27, 2023 • edited Loading

i-aki-y commented Feb 27, 2023

i-aki-y commented Mar 18, 2022 •

edited

Loading

The `target_dependence` property is used

mikel-brostrom commented Feb 20, 2023 •

edited

Loading

mikel-brostrom commented Feb 20, 2023 •

edited

Loading

mikel-brostrom commented Feb 22, 2023 •

edited

Loading

mikel-brostrom commented Feb 23, 2023 •

edited

Loading

mikel-brostrom commented Feb 27, 2023 •

edited

Loading