Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0. #2740

Closed
kashf99 opened this issue Oct 2, 2020 · 28 comments

Comments

@kashf99
Copy link

kashf99 commented Oct 2, 2020

i am training detecto for custom object detection. anyone who can help me as soon as possible. i will be very grateful to you.
here is the code.
from detecto import core, utils, visualize
dataset = core.Dataset('content/sample_data/newdataset/car/images/')
model = core.Model(['car'])
model.fit(dataset)

here is the output:

ValueError Traceback (most recent call last)
in ()
4 model = core.Model(['car'])
5
----> 6 model.fit(dataset)

2 frames
/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
91 raise ValueError("All bounding boxes should have positive height and width."
92 " Found invalid box {} for target at index {}."
---> 93 .format(degen_bb, target_idx))
94
95 features = self.backbone(images.tensors)

ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0.

@oke-aditya
Copy link
Contributor

I guess you have a degenerate box case. The boxes should be of format (xmin, ymin, xmax, ymax) for FRCNN to work.
You are having exactly opposite bounding box (degenerate case).

@fmassa
Copy link
Member

fmassa commented Oct 2, 2020

Hi,

The answer from @oke-aditya is correct. You are probably passing to the model bounding boxes in the format [xmin, ymin, width, height], while Faster R-CNN expects boxes to be in [xmin, ymin, xmax, ymax] format.

Changing this should fix the issue.

We have btw recently added box conversion utilities to torchvision (thanks to @oke-aditya ), they can be found in

def box_convert(boxes: Tensor, in_fmt: str, out_fmt: str) -> Tensor:
"""
Converts boxes from given in_fmt to out_fmt.
Supported in_fmt and out_fmt are:
'xyxy': boxes are represented via corners, x1, y1 being top left and x2, y2 being bottom right.
'xywh' : boxes are represented via corner, width and height, x1, y2 being top left, w, h being width and height.
'cxcywh' : boxes are represented via centre, width and height, cx, cy being center of box, w, h
being width and height.
Arguments:
boxes (Tensor[N, 4]): boxes which will be converted.
in_fmt (str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'].
out_fmt (str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']
Returns:
boxes (Tensor[N, 4]): Boxes into converted format.
"""

@kashf99
Copy link
Author

kashf99 commented Oct 2, 2020

So should I change my xml file format.

@fmassa
Copy link
Member

fmassa commented Oct 2, 2020

@kashf99 this question is better suited to the detecto repo, and this is part of their API. https://github.com/alankbi/detecto

@kashf99
Copy link
Author

kashf99 commented Oct 2, 2020

Ok thank you

@kashf99
Copy link
Author

kashf99 commented Oct 2, 2020

I guess you have a degenerate box case. The boxes should be of format (xmin, ymin, xmax, ymax) for FRCNN to work.
You are having exactly opposite bounding box (degenerate case).

Yeah thank you . It worked. But its very slow. Overload of nonzero is deprecated.

@fmassa
Copy link
Member

fmassa commented Oct 2, 2020

Overload of nonzero is deprecated.

This has been fixed in torchvision master since #2705

@MALLI7622
Copy link

MALLI7622 commented Jan 13, 2021

Hi @fmassa . I am also getting the same error, but I had passed [xmin, ymin, xmax, ymax] to the model. Can someone help me out.

@oke-aditya
Copy link
Contributor

Can you post details so that we can reproduce the issue ?

@MALLI7622
Copy link

@oke-aditya what I have share code or abstract details.

@oke-aditya
Copy link
Contributor

Any code sample that can help people to reprdouce error you get.

@MALLI7622
Copy link

boxes.append([xmin, ymin, xmax, ymax])
boxes = torch.as_tensor(boxes, dtype=torch.float32)
These are box cordinates. I'm passing.

@fmassa
Copy link
Member

fmassa commented Jan 20, 2021

@MALLI7622 make sure that xmin < xmax and that ymin < ymax for all boxes

@MALLI7622
Copy link

MALLI7622 commented Jan 21, 2021

@fmassa I had resolved the issue 4 days back, Thanks for your help. I was getting another error in Faster-RCNN. My model was resulting in these values. I don't know how to resolve this. I had changed the class index starting from 1 instead of 0 and increased output classes+1 because of starting with 1. Can you help me how to resolve this issue?
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

When I was predicting with this model. I didn't get anything. It was predicting this
[{'boxes': tensor([], device='cuda:0', size=(0, 4)),
'labels': tensor([], device='cuda:0', dtype=torch.int64),
'scores': tensor([], device='cuda:0')}]

@fmassa
Copy link
Member

fmassa commented Jan 21, 2021

@MALLI7622 this might be due to many things. I would encourage you to start with the finetuning tutorial in https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html , as maybe you are not training for long enough.

@clothme-io
Copy link

@MALLI7622 how did you resolved the issue? I having similar issue for a custom dataset with 39 classes(including background). Any help will do. Thanks

@MALLI7622
Copy link

@clothme-io Can you share your sample dataset file and also custom dataset class. I'll try to help you with it.

@clothme-io
Copy link

@MALLI7622 sure I can share it here as well as email it to you. And thank you for the help.

How I Generated The Dataset:

  1. Annotated the image with labelme (multiple parts in a single image)

  2. Generated a mask image (image below) from the annotated image.
    person95

  3. Then I used the code here: to generate segmentation images (image below) I loaded to the model.
    person95

Here is my custom dataset class:
`
class PersonDataset(torch.utils.data.Dataset):
def init(self, root, transforms=None):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "seg_image_use"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "seg_mask_use"))))

def __getitem__(self, idx):
    # load one image and mask using idx
    img_path = os.path.join(self.root, "seg_image_use", self.imgs[idx])
    mask_path = os.path.join(self.root, "seg_mask_use", self.masks[idx])
    img = Image.open(img_path).convert("RGB")
    # note that we haven't converted the mask to RGB,
    # because each color corresponds to a different instance
    # with 0 being background
    mask = Image.open(mask_path)

    mask = np.asarray(mask)
    # instances are encoded as different colors
    obj_ids = np.unique(mask)[1:] # first id is the background, so remove it   
    masks = mask == obj_ids[:, None, None]  # split the color-encoded mask into a set of binary masks
    # get bounding box coordinates for each mask
    num_objs = len(obj_ids)
    boxes = []

    for i in range(num_objs):
        pos = np.where(masks[i])
        xmin = np.min(pos[1])
        xmax = np.max(pos[1])
        ymin = np.min(pos[0])
        ymax = np.max(pos[0])
        boxes.append([xmin, ymin, xmax, ymax])

   # convert everything into torch.Tensor
    boxes = torch.as_tensor(boxes, dtype=torch.float32)      
    area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])

    target = {}
    target["boxes"] = boxes
    target["labels"] = torch.as_tensor(obj_ids, dtype=torch.int64) - 1
    target["masks"] = torch.as_tensor(masks, dtype=torch.uint8) 
    target["image_id"] = torch.tensor([idx]) 
    target["area"] = area
    target["iscrowd"] = torch.zeros((num_objs,), dtype=torch.int64) # suppose all instances are not crowd
    
    if self.transforms is not None:
        img, target = self.transforms(img, target)

    return img, target

def __len__(self):
    return len(self.imgs)

`

@OrielBanne
Copy link

Hi -

the example in torchvision is:

model22 = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

For training

images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
labels = torch.randint(1, 91, (4, 11))
images = list(image for image in images)
targets = []
for i in range(len(images)):
d = {}
d['boxes'] = boxes[i]
d['labels'] = labels[i]
targets.append(d)
output = model22(images, targets)

For inference

model22.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model22(x)

optionally, if you want to export the model to ONNX:

torch.onnx.export(model22, x, "faster_rcnn.onnx", opset_version = 11)

https://pytorch.org/vision/master/models.html#torchvision.models.detection.fasterrcnn_resnet50_fpn

and I get the same error:

ValueError: All bounding boxes should have positive height and width. Found invalid box [0.5358670949935913, 0.6406093239784241, 0.873319149017334, 0.33925700187683105] for target at index 0.

@fmassa
Copy link
Member

fmassa commented Aug 13, 2021

@OrielBanne one of you bounding boxes have a negative height, I would recommend you checking your training data

@mrinath123
Copy link
Contributor

@OrielBanne Yes I found the same error while using this, maybe producing random bboxes( torch.rand(4, 11, 4)) is creating the problem

@Esraanageh22
Copy link

i have the same error
ask51
and i have checked the data
ask1
ask2

@santhoshnumberone
Copy link

santhoshnumberone commented Apr 29, 2022

I have a similar issue

Following this tutorial Building Your Own Object Detector Pytorch Vs Tensorflow And How To Even Get Started to use transfer learning to train a custom data set

Running on Google Colab using CPU
Pytorch version: 1.11.0+cu113
Python version: Python 3.7.13

Clone the github repo of pytorch vision as mentioned and copy - pasted the verion0.3.3 files vision/references/detection in the working directory

references/detection/utils.py ../
references/detection/transforms.py ../
references/detection/coco_eval.py ../
references/detection/engine.py ../
references/detection/coco_utils.py ../

Model i am using

# load an object detection model pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

I did manually check the csv file to see if any bounding boxes values have negative value which I couldn't find any values.

Gave a print statement inside the engine.py file where the error was being pointed to check for negative values of the bounding box

    for images, targets in metric_logger.log_every(data_loader, print_freq, header):
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        print("######################",targets)

        loss_dict = model(images, targets)

Print statement output of targets where the error is being pointed not even a single negative value

###################### [{'boxes': tensor([[ 98., 672., 829., 864.]]), 'labels': tensor([1]), 'image_id': tensor([734]), 'area': tensor([140352.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[262.,  85., 463., 275.]]), 'labels': tensor([1]), 'image_id': tensor([110]), 'area': tensor([38190.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 82., 275., 259., 281.]]), 'labels': tensor([1]), 'image_id': tensor([296]), 'area': tensor([1062.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 85.,   0., 357., 238.]]), 'labels': tensor([1]), 'image_id': tensor([68]), 'area': tensor([64736.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[188., 400., 730., 880.]]), 'labels': tensor([1]), 'image_id': tensor([788]), 'area': tensor([260160.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 40., 118., 320., 155.]]), 'labels': tensor([1]), 'image_id': tensor([598]), 'area': tensor([10360.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 245., 293., 347.]]), 'labels': tensor([1]), 'image_id': tensor([605]), 'area': tensor([29886.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[201., 838., 611., 621.]]), 'labels': tensor([1]), 'image_id': tensor([696]), 'area': tensor([-88970.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[488., 669., 774., 541.]]), 'labels': tensor([1]), 'image_id': tensor([985]), 'area': tensor([-36608.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[129., 242., 138., 119.]]), 'labels': tensor([1]), 'image_id': tensor([813]), 'area': tensor([-1107.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 36.,  77., 258., 247.]]), 'labels': tensor([1]), 'image_id': tensor([1780]), 'area': tensor([37740.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 66.,  49., 308., 283.]]), 'labels': tensor([1]), 'image_id': tensor([868]), 'area': tensor([56628.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 23., 182., 343., 318.]]), 'labels': tensor([1]), 'image_id': tensor([1290]), 'area': tensor([43520.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[215.,   0., 500., 266.]]), 'labels': tensor([1]), 'image_id': tensor([111]), 'area': tensor([75810.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 99., 105., 349., 210.]]), 'labels': tensor([1]), 'image_id': tensor([1350]), 'area': tensor([26250.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[319., 842., 384., 541.]]), 'labels': tensor([1]), 'image_id': tensor([803]), 'area': tensor([-19565.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0.,  19., 269., 283.]]), 'labels': tensor([1]), 'image_id': tensor([409]), 'area': tensor([71016.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 31.,   0., 360., 339.]]), 'labels': tensor([1]), 'image_id': tensor([1651]), 'area': tensor([111531.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 714., 585., 646.]]), 'labels': tensor([1]), 'image_id': tensor([989]), 'area': tensor([-39780.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 51., 170., 314., 317.]]), 'labels': tensor([1]), 'image_id': tensor([1449]), 'area': tensor([38661.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[394.,  66., 640., 294.]]), 'labels': tensor([1]), 'image_id': tensor([177]), 'area': tensor([56088.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[396., 723., 592., 627.]]), 'labels': tensor([1]), 'image_id': tensor([940]), 'area': tensor([-18816.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 95.,  54., 360., 187.]]), 'labels': tensor([1]), 'image_id': tensor([1579]), 'area': tensor([35245.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 67., 112., 293., 307.]]), 'labels': tensor([1]), 'image_id': tensor([1508]), 'area': tensor([44070.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 11.,   0., 452., 355.]]), 'labels': tensor([1]), 'image_id': tensor([1162]), 'area': tensor([156555.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[268., 515., 698., 746.]]), 'labels': tensor([1]), 'image_id': tensor([741]), 'area': tensor([99330.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[156., 851., 598., 624.]]), 'labels': tensor([1]), 'image_id': tensor([900]), 'area': tensor([-100334.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 44., 123., 341., 305.]]), 'labels': tensor([1]), 'image_id': tensor([680]), 'area': tensor([54054.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 598., 282.]]), 'labels': tensor([1]), 'image_id': tensor([1163]), 'area': tensor([102366.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 43., 156., 277., 289.]]), 'labels': tensor([1]), 'image_id': tensor([360]), 'area': tensor([31122.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 94.,   0., 266., 250.]]), 'labels': tensor([1]), 'image_id': tensor([1591]), 'area': tensor([43000.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 71.,  38., 343., 322.]]), 'labels': tensor([1]), 'image_id': tensor([1809]), 'area': tensor([77248.]), 'iscrowd': tensor([0])}]

I get his error

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
###################### [{'boxes': tensor([[ 98., 672., 829., 864.]]), 'labels': tensor([1]), 'image_id': tensor([734]), 'area': tensor([140352.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[262.,  85., 463., 275.]]), 'labels': tensor([1]), 'image_id': tensor([110]), 'area': tensor([38190.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 82., 275., 259., 281.]]), 'labels': tensor([1]), 'image_id': tensor([296]), 'area': tensor([1062.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 85.,   0., 357., 238.]]), 'labels': tensor([1]), 'image_id': tensor([68]), 'area': tensor([64736.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[188., 400., 730., 880.]]), 'labels': tensor([1]), 'image_id': tensor([788]), 'area': tensor([260160.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 40., 118., 320., 155.]]), 'labels': tensor([1]), 'image_id': tensor([598]), 'area': tensor([10360.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 245., 293., 347.]]), 'labels': tensor([1]), 'image_id': tensor([605]), 'area': tensor([29886.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[201., 838., 611., 621.]]), 'labels': tensor([1]), 'image_id': tensor([696]), 'area': tensor([-88970.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[488., 669., 774., 541.]]), 'labels': tensor([1]), 'image_id': tensor([985]), 'area': tensor([-36608.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[129., 242., 138., 119.]]), 'labels': tensor([1]), 'image_id': tensor([813]), 'area': tensor([-1107.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 36.,  77., 258., 247.]]), 'labels': tensor([1]), 'image_id': tensor([1780]), 'area': tensor([37740.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 66.,  49., 308., 283.]]), 'labels': tensor([1]), 'image_id': tensor([868]), 'area': tensor([56628.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 23., 182., 343., 318.]]), 'labels': tensor([1]), 'image_id': tensor([1290]), 'area': tensor([43520.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[215.,   0., 500., 266.]]), 'labels': tensor([1]), 'image_id': tensor([111]), 'area': tensor([75810.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 99., 105., 349., 210.]]), 'labels': tensor([1]), 'image_id': tensor([1350]), 'area': tensor([26250.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[319., 842., 384., 541.]]), 'labels': tensor([1]), 'image_id': tensor([803]), 'area': tensor([-19565.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0.,  19., 269., 283.]]), 'labels': tensor([1]), 'image_id': tensor([409]), 'area': tensor([71016.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 31.,   0., 360., 339.]]), 'labels': tensor([1]), 'image_id': tensor([1651]), 'area': tensor([111531.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 714., 585., 646.]]), 'labels': tensor([1]), 'image_id': tensor([989]), 'area': tensor([-39780.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 51., 170., 314., 317.]]), 'labels': tensor([1]), 'image_id': tensor([1449]), 'area': tensor([38661.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[394.,  66., 640., 294.]]), 'labels': tensor([1]), 'image_id': tensor([177]), 'area': tensor([56088.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[396., 723., 592., 627.]]), 'labels': tensor([1]), 'image_id': tensor([940]), 'area': tensor([-18816.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 95.,  54., 360., 187.]]), 'labels': tensor([1]), 'image_id': tensor([1579]), 'area': tensor([35245.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 67., 112., 293., 307.]]), 'labels': tensor([1]), 'image_id': tensor([1508]), 'area': tensor([44070.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 11.,   0., 452., 355.]]), 'labels': tensor([1]), 'image_id': tensor([1162]), 'area': tensor([156555.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[268., 515., 698., 746.]]), 'labels': tensor([1]), 'image_id': tensor([741]), 'area': tensor([99330.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[156., 851., 598., 624.]]), 'labels': tensor([1]), 'image_id': tensor([900]), 'area': tensor([-100334.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 44., 123., 341., 305.]]), 'labels': tensor([1]), 'image_id': tensor([680]), 'area': tensor([54054.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 598., 282.]]), 'labels': tensor([1]), 'image_id': tensor([1163]), 'area': tensor([102366.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 43., 156., 277., 289.]]), 'labels': tensor([1]), 'image_id': tensor([360]), 'area': tensor([31122.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 94.,   0., 266., 250.]]), 'labels': tensor([1]), 'image_id': tensor([1591]), 'area': tensor([43000.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 71.,  38., 343., 322.]]), 'labels': tensor([1]), 'image_id': tensor([1809]), 'area': tensor([77248.]), 'iscrowd': tensor([0])}]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-13-f8100031e21d>](https://localhost:8080/#) in <module>()
      2 for epoch in range(num_epochs):
      3     # train for one epoch, printing every 10 iterations
----> 4     train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10)
      5     # update the learning rate
      6     lr_scheduler.step()

2 frames
[/content/drive/MyDrive/PytorchObjectDetector/engine.py](https://localhost:8080/#) in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq)
     30         print("######################",targets)
     31 
---> 32         loss_dict = model(images, targets)
     33 
     34         losses = sum(loss for loss in loss_dict.values())

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py](https://localhost:8080/#) in forward(self, images, targets)
     89                     degen_bb: List[float] = boxes[bb_idx].tolist()
     90                     raise ValueError(
---> 91                         "All bounding boxes should have positive height and width."
     92                         f" Found invalid box {degen_bb} for target at index {target_idx}."
     93                     )

ValueError: All bounding boxes should have positive height and width. Found invalid box [139.397216796875, 581.7989501953125, 423.73980712890625, 431.1422119140625] for target at index 7.

I am sure, the problem has been addressed long back by looking at this responses given here

But I look at this post on stackoverflow suffering from same error ValueError: All bounding boxes should have positive height and width

Could any of you guide what exactly should be changed? and where it has to be changed?

I will surely write a medium blog on Pytorch Object Detection from custom data using Transfer Learning after I have sorted out these few minor hiccups

@fmassa I guess you could help me sort this issue out

@abhi-glitchhg
Copy link
Contributor

Hey @santhoshnumberone ,
refer to @oke-aditya's comment here- #2740 (comment). The bounding boxes should be in form of (xmin, ymin, xmax, ymax).

In your bounding box data, there are few datapoints which do not fit the above format, some of them are -

tensor([[201., 838., 611., 621.]])
tensor([[488., 669., 774., 541.]])
tensor([[129., 242., 138., 119.]])
tensor([[319., 842., 384., 541.]])
tensor([[  0., 714., 585., 646.]])
tensor([[396., 723., 592., 627.]])
tensor([[156., 851., 598., 624.]])

so first you need to check the format of the bounding boxes that you have. You need to convert the bounding boxes to (xmin, ymin, xmax, ymax) format.
This function might be helpful for converting the bounding boxes.

def box_convert(boxes: Tensor, in_fmt: str, out_fmt: str) -> Tensor:
"""
Converts boxes from given in_fmt to out_fmt.
Supported in_fmt and out_fmt are:
'xyxy': boxes are represented via corners, x1, y1 being top left and x2, y2 being bottom right.
'xywh' : boxes are represented via corner, width and height, x1, y2 being top left, w, h being width and height.
'cxcywh' : boxes are represented via centre, width and height, cx, cy being center of box, w, h
being width and height.
Arguments:
boxes (Tensor[N, 4]): boxes which will be converted.
in_fmt (str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'].
out_fmt (str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']
Returns:
boxes (Tensor[N, 4]): Boxes into converted format.
"""
allowed_fmts = ("xyxy", "xywh", "cxcywh")
assert in_fmt in allowed_fmts
assert out_fmt in allowed_fmts
if in_fmt == out_fmt:
boxes_converted = boxes.clone()
return boxes_converted
if in_fmt != 'xyxy' and out_fmt != 'xyxy':
if in_fmt == "xywh":
boxes_xyxy = _box_xywh_to_xyxy(boxes)
if out_fmt == "cxcywh":
boxes_converted = _box_xyxy_to_cxcywh(boxes_xyxy)
elif in_fmt == "cxcywh":
boxes_xyxy = _box_cxcywh_to_xyxy(boxes)
if out_fmt == "xywh":
boxes_converted = _box_xyxy_to_xywh(boxes_xyxy)
# convert one to xyxy and change either in_fmt or out_fmt to xyxy
else:
if in_fmt == "xyxy":
if out_fmt == "xywh":
boxes_converted = _box_xyxy_to_xywh(boxes)
elif out_fmt == "cxcywh":
boxes_converted = _box_xyxy_to_cxcywh(boxes)
elif out_fmt == "xyxy":
if in_fmt == "xywh":
boxes_converted = _box_xywh_to_xyxy(boxes)
elif in_fmt == "cxcywh":
boxes_converted = _box_cxcywh_to_xyxy(boxes)
return boxes_converted

I hope this helps.

@oke-aditya
Copy link
Contributor

Also note that if you are trying to train an object detection model you should use

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

since mask_rcnn is an instance segmentation model which will expect segmentation mask during training.

@santhoshnumberone
Copy link

santhoshnumberone commented Apr 29, 2022

Hey @santhoshnumberone , refer to @oke-aditya's comment here- #2740 (comment). The bounding boxes should be in form of (xmin, ymin, xmax, ymax).

In your bounding box data, there are few datapoints which do not fit the above format, some of them are -

tensor([[201., 838., 611., 621.]])
tensor([[488., 669., 774., 541.]])
tensor([[129., 242., 138., 119.]])
tensor([[319., 842., 384., 541.]])
tensor([[  0., 714., 585., 646.]])
tensor([[396., 723., 592., 627.]])
tensor([[156., 851., 598., 624.]])

so first you need to check the format of the bounding boxes that you have. You need to convert the bounding boxes to (xmin, ymin, xmax, ymax) format. This function might be helpful for converting the bounding boxes.

def box_convert(boxes: Tensor, in_fmt: str, out_fmt: str) -> Tensor:
"""
Converts boxes from given in_fmt to out_fmt.
Supported in_fmt and out_fmt are:
'xyxy': boxes are represented via corners, x1, y1 being top left and x2, y2 being bottom right.
'xywh' : boxes are represented via corner, width and height, x1, y2 being top left, w, h being width and height.
'cxcywh' : boxes are represented via centre, width and height, cx, cy being center of box, w, h
being width and height.
Arguments:
boxes (Tensor[N, 4]): boxes which will be converted.
in_fmt (str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'].
out_fmt (str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']
Returns:
boxes (Tensor[N, 4]): Boxes into converted format.
"""
allowed_fmts = ("xyxy", "xywh", "cxcywh")
assert in_fmt in allowed_fmts
assert out_fmt in allowed_fmts
if in_fmt == out_fmt:
boxes_converted = boxes.clone()
return boxes_converted
if in_fmt != 'xyxy' and out_fmt != 'xyxy':
if in_fmt == "xywh":
boxes_xyxy = _box_xywh_to_xyxy(boxes)
if out_fmt == "cxcywh":
boxes_converted = _box_xyxy_to_cxcywh(boxes_xyxy)
elif in_fmt == "cxcywh":
boxes_xyxy = _box_cxcywh_to_xyxy(boxes)
if out_fmt == "xywh":
boxes_converted = _box_xyxy_to_xywh(boxes_xyxy)
# convert one to xyxy and change either in_fmt or out_fmt to xyxy
else:
if in_fmt == "xyxy":
if out_fmt == "xywh":
boxes_converted = _box_xyxy_to_xywh(boxes)
elif out_fmt == "cxcywh":
boxes_converted = _box_xyxy_to_cxcywh(boxes)
elif out_fmt == "xyxy":
if in_fmt == "xywh":
boxes_converted = _box_xywh_to_xyxy(boxes)
elif in_fmt == "cxcywh":
boxes_converted = _box_cxcywh_to_xyxy(boxes)
return boxes_converted

I hope this helps.

Thank you for highlighting the issue, will look into it.
I blindly trusted a popular online image labelling tool to annotate my custom data

@santhoshnumberone
Copy link

santhoshnumberone commented Apr 29, 2022

Also note that if you are trying to train an object detection model you should use

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

since mask_rcnn is an instance segmentation model which will expect segmentation mask during training.

MarkRCNNSl

Can't I freeze everything apart from object detection block using requires_grad = False and train it?

PS

Mask is required to calculate the loss I guess, I got his error

  cpuset_checked))
###################### [{'boxes': tensor([[132.,   0., 435., 285.]]), 'labels': tensor([1]), 'image_id': tensor([1889]), 'area': tensor([86355.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 640., 315.]]), 'labels': tensor([1]), 'image_id': tensor([1210]), 'area': tensor([127575.]), 'iscrowd': tensor([0])}]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-16-05e881bbc3b2>](https://localhost:8080/#) in <module>()
      2 for epoch in range(num_epochs):
      3     # train for one epoch, printing every 10 iterations
----> 4     train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10)
      5     # update the learning rate
      6     lr_scheduler.step()

6 frames
[/content/drive/MyDrive/PytorchObjectDetector/engine.py](https://localhost:8080/#) in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq)
     30         print("######################",targets)
     31 
---> 32         loss_dict = model(images, targets)
     33 
     34         losses = sum(loss for loss in loss_dict.values())

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py](https://localhost:8080/#) in forward(self, images, targets)
     97             features = OrderedDict([("0", features)])
     98         proposals, proposal_losses = self.rpn(images, features, targets)
---> 99         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
    100         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)  # type: ignore[operator]
    101 

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in forward(self, features, proposals, image_shapes, targets)
    743 
    744         if self.training:
--> 745             proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
    746         else:
    747             labels = None

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in select_training_samples(self, proposals, targets)
    628     ):
    629         # type: (...) -> Tuple[List[Tensor], List[Tensor], List[Tensor], List[Tensor]]
--> 630         self.check_targets(targets)
    631         assert targets is not None
    632         dtype = proposals[0].dtype

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in check_targets(self, targets)
    620         assert all(["labels" in t for t in targets])
    621         if self.has_mask():
--> 622             assert all(["masks" in t for t in targets])
    623 
    624     def select_training_samples(

AssertionError:

@ihebchiha123
Copy link

I had the same problem, all the images and the masks were fine, for the image augmentation I was using this transforms :

from torchvision.transforms import v2 as T
def get_transform(train):

transforms = []
if train:
    transforms.append(T.RandomHorizontalFlip(0.2))
    #transforms.append(T.RandomRotation(10))
transforms.append(T.ToDtype(torch.float, scale=True))
transforms.append(T.ToPureTensor())
return T.Compose(transforms)

when "transforms.append(T.RandomRotation(10))" was uncommented, i had an error when i start the training, but when I commented that line the training step was successfully done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests