ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0. #2740

kashf99 · 2020-10-02T06:11:29Z

i am training detecto for custom object detection. anyone who can help me as soon as possible. i will be very grateful to you.
here is the code.
from detecto import core, utils, visualize
dataset = core.Dataset('content/sample_data/newdataset/car/images/')
model = core.Model(['car'])
model.fit(dataset)

here is the output:

ValueError Traceback (most recent call last)
in ()
4 model = core.Model(['car'])
5
----> 6 model.fit(dataset)

2 frames
/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
91 raise ValueError("All bounding boxes should have positive height and width."
92 " Found invalid box {} for target at index {}."
---> 93 .format(degen_bb, target_idx))
94
95 features = self.backbone(images.tensors)

ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0.

oke-aditya · 2020-10-02T07:32:24Z

I guess you have a degenerate box case. The boxes should be of format (xmin, ymin, xmax, ymax) for FRCNN to work.
You are having exactly opposite bounding box (degenerate case).

fmassa · 2020-10-02T08:55:32Z

Hi,

The answer from @oke-aditya is correct. You are probably passing to the model bounding boxes in the format [xmin, ymin, width, height], while Faster R-CNN expects boxes to be in [xmin, ymin, xmax, ymax] format.

Changing this should fix the issue.

We have btw recently added box conversion utilities to torchvision (thanks to @oke-aditya ), they can be found in

vision/torchvision/ops/boxes.py

Lines 137 to 156 in a98e17e

    
           def box_convert(boxes: Tensor, in_fmt: str, out_fmt: str) -> Tensor: 
        
               """ 
        
               Converts boxes from given in_fmt to out_fmt. 
        
               Supported in_fmt and out_fmt are: 
        
               'xyxy': boxes are represented via corners, x1, y1 being top left and x2, y2 being bottom right. 
        
               'xywh' : boxes are represented via corner, width and height, x1, y2 being top left, w, h being width and height. 
        
               'cxcywh' : boxes are represented via centre, width and height, cx, cy being center of box, w, h 
        
               being width and height. 
        
               Arguments: 
        
                   boxes (Tensor[N, 4]): boxes which will be converted. 
        
                   in_fmt (str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']. 
        
                   out_fmt (str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'] 
        
               Returns: 
        
                   boxes (Tensor[N, 4]): Boxes into converted format. 
        
               """

kashf99 · 2020-10-02T09:42:59Z

So should I change my xml file format.

fmassa · 2020-10-02T09:48:10Z

@kashf99 this question is better suited to the detecto repo, and this is part of their API. https://github.com/alankbi/detecto

kashf99 · 2020-10-02T09:49:11Z

Ok thank you

kashf99 · 2020-10-02T14:56:26Z

I guess you have a degenerate box case. The boxes should be of format (xmin, ymin, xmax, ymax) for FRCNN to work.
You are having exactly opposite bounding box (degenerate case).

Yeah thank you . It worked. But its very slow. Overload of nonzero is deprecated.

fmassa · 2020-10-02T15:04:43Z

Overload of nonzero is deprecated.

This has been fixed in torchvision master since #2705

MALLI7622 · 2021-01-13T08:48:24Z

Hi @fmassa . I am also getting the same error, but I had passed [xmin, ymin, xmax, ymax] to the model. Can someone help me out.

oke-aditya · 2021-01-13T09:01:48Z

Can you post details so that we can reproduce the issue ?

MALLI7622 · 2021-01-13T09:06:23Z

@oke-aditya what I have share code or abstract details.

oke-aditya · 2021-01-13T09:07:36Z

Any code sample that can help people to reprdouce error you get.

MALLI7622 · 2021-01-13T09:09:10Z

boxes.append([xmin, ymin, xmax, ymax])
boxes = torch.as_tensor(boxes, dtype=torch.float32)
These are box cordinates. I'm passing.

fmassa · 2021-01-20T14:12:32Z

@MALLI7622 make sure that xmin < xmax and that ymin < ymax for all boxes

MALLI7622 · 2021-01-21T04:24:27Z

@fmassa I had resolved the issue 4 days back, Thanks for your help. I was getting another error in Faster-RCNN. My model was resulting in these values. I don't know how to resolve this. I had changed the class index starting from 1 instead of 0 and increased output classes+1 because of starting with 1. Can you help me how to resolve this issue?
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

When I was predicting with this model. I didn't get anything. It was predicting this
[{'boxes': tensor([], device='cuda:0', size=(0, 4)),
'labels': tensor([], device='cuda:0', dtype=torch.int64),
'scores': tensor([], device='cuda:0')}]

fmassa · 2021-01-21T12:48:53Z

@MALLI7622 this might be due to many things. I would encourage you to start with the finetuning tutorial in https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html , as maybe you are not training for long enough.

clothme-io · 2021-02-10T16:53:37Z

@MALLI7622 how did you resolved the issue? I having similar issue for a custom dataset with 39 classes(including background). Any help will do. Thanks

MALLI7622 · 2021-02-11T05:01:15Z

@clothme-io Can you share your sample dataset file and also custom dataset class. I'll try to help you with it.

clothme-io · 2021-02-12T17:39:07Z

@MALLI7622 sure I can share it here as well as email it to you. And thank you for the help.

How I Generated The Dataset:

Annotated the image with labelme (multiple parts in a single image)
Generated a mask image (image below) from the annotated image.
Then I used the code here: to generate segmentation images (image below) I loaded to the model.

Here is my custom dataset class:
`
class PersonDataset(torch.utils.data.Dataset):
def init(self, root, transforms=None):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "seg_image_use"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "seg_mask_use"))))

def __getitem__(self, idx):
    # load one image and mask using idx
    img_path = os.path.join(self.root, "seg_image_use", self.imgs[idx])
    mask_path = os.path.join(self.root, "seg_mask_use", self.masks[idx])
    img = Image.open(img_path).convert("RGB")
    # note that we haven't converted the mask to RGB,
    # because each color corresponds to a different instance
    # with 0 being background
    mask = Image.open(mask_path)

    mask = np.asarray(mask)
    # instances are encoded as different colors
    obj_ids = np.unique(mask)[1:] # first id is the background, so remove it   
    masks = mask == obj_ids[:, None, None]  # split the color-encoded mask into a set of binary masks
    # get bounding box coordinates for each mask
    num_objs = len(obj_ids)
    boxes = []

    for i in range(num_objs):
        pos = np.where(masks[i])
        xmin = np.min(pos[1])
        xmax = np.max(pos[1])
        ymin = np.min(pos[0])
        ymax = np.max(pos[0])
        boxes.append([xmin, ymin, xmax, ymax])

   # convert everything into torch.Tensor
    boxes = torch.as_tensor(boxes, dtype=torch.float32)      
    area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])

    target = {}
    target["boxes"] = boxes
    target["labels"] = torch.as_tensor(obj_ids, dtype=torch.int64) - 1
    target["masks"] = torch.as_tensor(masks, dtype=torch.uint8) 
    target["image_id"] = torch.tensor([idx]) 
    target["area"] = area
    target["iscrowd"] = torch.zeros((num_objs,), dtype=torch.int64) # suppose all instances are not crowd
    
    if self.transforms is not None:
        img, target = self.transforms(img, target)

    return img, target

def __len__(self):
    return len(self.imgs)

`

OrielBanne · 2021-08-01T09:54:04Z

Hi -

the example in torchvision is:

model22 = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

For training

images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
labels = torch.randint(1, 91, (4, 11))
images = list(image for image in images)
targets = []
for i in range(len(images)):
d = {}
d['boxes'] = boxes[i]
d['labels'] = labels[i]
targets.append(d)
output = model22(images, targets)

For inference

model22.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model22(x)

optionally, if you want to export the model to ONNX:

torch.onnx.export(model22, x, "faster_rcnn.onnx", opset_version = 11)

https://pytorch.org/vision/master/models.html#torchvision.models.detection.fasterrcnn_resnet50_fpn

and I get the same error:

ValueError: All bounding boxes should have positive height and width. Found invalid box [0.5358670949935913, 0.6406093239784241, 0.873319149017334, 0.33925700187683105] for target at index 0.

fmassa · 2021-08-13T12:15:12Z

@OrielBanne one of you bounding boxes have a negative height, I would recommend you checking your training data

mrinath123 · 2022-01-05T08:05:19Z

@OrielBanne Yes I found the same error while using this, maybe producing random bboxes( torch.rand(4, 11, 4)) is creating the problem

Esraanageh22 · 2022-03-18T19:07:16Z

i have the same error

and i have checked the data

santhoshnumberone · 2022-04-29T04:58:13Z

I have a similar issue

Following this tutorial Building Your Own Object Detector Pytorch Vs Tensorflow And How To Even Get Started to use transfer learning to train a custom data set

Running on Google Colab using CPU
Pytorch version: 1.11.0+cu113
Python version: Python 3.7.13

Clone the github repo of pytorch vision as mentioned and copy - pasted the verion0.3.3 files vision/references/detection in the working directory

references/detection/utils.py ../
references/detection/transforms.py ../
references/detection/coco_eval.py ../
references/detection/engine.py ../
references/detection/coco_utils.py ../

Model i am using

# load an object detection model pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

I did manually check the csv file to see if any bounding boxes values have negative value which I couldn't find any values.

Gave a print statement inside the engine.py file where the error was being pointed to check for negative values of the bounding box

    for images, targets in metric_logger.log_every(data_loader, print_freq, header):
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        print("######################",targets)

        loss_dict = model(images, targets)

Print statement output of targets where the error is being pointed not even a single negative value

###################### [{'boxes': tensor([[ 98., 672., 829., 864.]]), 'labels': tensor([1]), 'image_id': tensor([734]), 'area': tensor([140352.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[262.,  85., 463., 275.]]), 'labels': tensor([1]), 'image_id': tensor([110]), 'area': tensor([38190.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 82., 275., 259., 281.]]), 'labels': tensor([1]), 'image_id': tensor([296]), 'area': tensor([1062.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 85.,   0., 357., 238.]]), 'labels': tensor([1]), 'image_id': tensor([68]), 'area': tensor([64736.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[188., 400., 730., 880.]]), 'labels': tensor([1]), 'image_id': tensor([788]), 'area': tensor([260160.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 40., 118., 320., 155.]]), 'labels': tensor([1]), 'image_id': tensor([598]), 'area': tensor([10360.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 245., 293., 347.]]), 'labels': tensor([1]), 'image_id': tensor([605]), 'area': tensor([29886.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[201., 838., 611., 621.]]), 'labels': tensor([1]), 'image_id': tensor([696]), 'area': tensor([-88970.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[488., 669., 774., 541.]]), 'labels': tensor([1]), 'image_id': tensor([985]), 'area': tensor([-36608.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[129., 242., 138., 119.]]), 'labels': tensor([1]), 'image_id': tensor([813]), 'area': tensor([-1107.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 36.,  77., 258., 247.]]), 'labels': tensor([1]), 'image_id': tensor([1780]), 'area': tensor([37740.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 66.,  49., 308., 283.]]), 'labels': tensor([1]), 'image_id': tensor([868]), 'area': tensor([56628.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 23., 182., 343., 318.]]), 'labels': tensor([1]), 'image_id': tensor([1290]), 'area': tensor([43520.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[215.,   0., 500., 266.]]), 'labels': tensor([1]), 'image_id': tensor([111]), 'area': tensor([75810.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 99., 105., 349., 210.]]), 'labels': tensor([1]), 'image_id': tensor([1350]), 'area': tensor([26250.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[319., 842., 384., 541.]]), 'labels': tensor([1]), 'image_id': tensor([803]), 'area': tensor([-19565.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0.,  19., 269., 283.]]), 'labels': tensor([1]), 'image_id': tensor([409]), 'area': tensor([71016.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 31.,   0., 360., 339.]]), 'labels': tensor([1]), 'image_id': tensor([1651]), 'area': tensor([111531.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 714., 585., 646.]]), 'labels': tensor([1]), 'image_id': tensor([989]), 'area': tensor([-39780.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 51., 170., 314., 317.]]), 'labels': tensor([1]), 'image_id': tensor([1449]), 'area': tensor([38661.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[394.,  66., 640., 294.]]), 'labels': tensor([1]), 'image_id': tensor([177]), 'area': tensor([56088.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[396., 723., 592., 627.]]), 'labels': tensor([1]), 'image_id': tensor([940]), 'area': tensor([-18816.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 95.,  54., 360., 187.]]), 'labels': tensor([1]), 'image_id': tensor([1579]), 'area': tensor([35245.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 67., 112., 293., 307.]]), 'labels': tensor([1]), 'image_id': tensor([1508]), 'area': tensor([44070.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 11.,   0., 452., 355.]]), 'labels': tensor([1]), 'image_id': tensor([1162]), 'area': tensor([156555.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[268., 515., 698., 746.]]), 'labels': tensor([1]), 'image_id': tensor([741]), 'area': tensor([99330.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[156., 851., 598., 624.]]), 'labels': tensor([1]), 'image_id': tensor([900]), 'area': tensor([-100334.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 44., 123., 341., 305.]]), 'labels': tensor([1]), 'image_id': tensor([680]), 'area': tensor([54054.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 598., 282.]]), 'labels': tensor([1]), 'image_id': tensor([1163]), 'area': tensor([102366.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 43., 156., 277., 289.]]), 'labels': tensor([1]), 'image_id': tensor([360]), 'area': tensor([31122.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 94.,   0., 266., 250.]]), 'labels': tensor([1]), 'image_id': tensor([1591]), 'area': tensor([43000.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 71.,  38., 343., 322.]]), 'labels': tensor([1]), 'image_id': tensor([1809]), 'area': tensor([77248.]), 'iscrowd': tensor([0])}]

I get his error

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
###################### [{'boxes': tensor([[ 98., 672., 829., 864.]]), 'labels': tensor([1]), 'image_id': tensor([734]), 'area': tensor([140352.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[262.,  85., 463., 275.]]), 'labels': tensor([1]), 'image_id': tensor([110]), 'area': tensor([38190.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 82., 275., 259., 281.]]), 'labels': tensor([1]), 'image_id': tensor([296]), 'area': tensor([1062.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 85.,   0., 357., 238.]]), 'labels': tensor([1]), 'image_id': tensor([68]), 'area': tensor([64736.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[188., 400., 730., 880.]]), 'labels': tensor([1]), 'image_id': tensor([788]), 'area': tensor([260160.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 40., 118., 320., 155.]]), 'labels': tensor([1]), 'image_id': tensor([598]), 'area': tensor([10360.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 245., 293., 347.]]), 'labels': tensor([1]), 'image_id': tensor([605]), 'area': tensor([29886.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[201., 838., 611., 621.]]), 'labels': tensor([1]), 'image_id': tensor([696]), 'area': tensor([-88970.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[488., 669., 774., 541.]]), 'labels': tensor([1]), 'image_id': tensor([985]), 'area': tensor([-36608.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[129., 242., 138., 119.]]), 'labels': tensor([1]), 'image_id': tensor([813]), 'area': tensor([-1107.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 36.,  77., 258., 247.]]), 'labels': tensor([1]), 'image_id': tensor([1780]), 'area': tensor([37740.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 66.,  49., 308., 283.]]), 'labels': tensor([1]), 'image_id': tensor([868]), 'area': tensor([56628.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 23., 182., 343., 318.]]), 'labels': tensor([1]), 'image_id': tensor([1290]), 'area': tensor([43520.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[215.,   0., 500., 266.]]), 'labels': tensor([1]), 'image_id': tensor([111]), 'area': tensor([75810.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 99., 105., 349., 210.]]), 'labels': tensor([1]), 'image_id': tensor([1350]), 'area': tensor([26250.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[319., 842., 384., 541.]]), 'labels': tensor([1]), 'image_id': tensor([803]), 'area': tensor([-19565.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0.,  19., 269., 283.]]), 'labels': tensor([1]), 'image_id': tensor([409]), 'area': tensor([71016.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 31.,   0., 360., 339.]]), 'labels': tensor([1]), 'image_id': tensor([1651]), 'area': tensor([111531.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 714., 585., 646.]]), 'labels': tensor([1]), 'image_id': tensor([989]), 'area': tensor([-39780.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 51., 170., 314., 317.]]), 'labels': tensor([1]), 'image_id': tensor([1449]), 'area': tensor([38661.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[394.,  66., 640., 294.]]), 'labels': tensor([1]), 'image_id': tensor([177]), 'area': tensor([56088.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[396., 723., 592., 627.]]), 'labels': tensor([1]), 'image_id': tensor([940]), 'area': tensor([-18816.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 95.,  54., 360., 187.]]), 'labels': tensor([1]), 'image_id': tensor([1579]), 'area': tensor([35245.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 67., 112., 293., 307.]]), 'labels': tensor([1]), 'image_id': tensor([1508]), 'area': tensor([44070.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 11.,   0., 452., 355.]]), 'labels': tensor([1]), 'image_id': tensor([1162]), 'area': tensor([156555.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[268., 515., 698., 746.]]), 'labels': tensor([1]), 'image_id': tensor([741]), 'area': tensor([99330.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[156., 851., 598., 624.]]), 'labels': tensor([1]), 'image_id': tensor([900]), 'area': tensor([-100334.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 44., 123., 341., 305.]]), 'labels': tensor([1]), 'image_id': tensor([680]), 'area': tensor([54054.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 598., 282.]]), 'labels': tensor([1]), 'image_id': tensor([1163]), 'area': tensor([102366.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 43., 156., 277., 289.]]), 'labels': tensor([1]), 'image_id': tensor([360]), 'area': tensor([31122.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 94.,   0., 266., 250.]]), 'labels': tensor([1]), 'image_id': tensor([1591]), 'area': tensor([43000.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 71.,  38., 343., 322.]]), 'labels': tensor([1]), 'image_id': tensor([1809]), 'area': tensor([77248.]), 'iscrowd': tensor([0])}]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-13-f8100031e21d>](https://localhost:8080/#) in <module>()
      2 for epoch in range(num_epochs):
      3     # train for one epoch, printing every 10 iterations
----> 4     train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10)
      5     # update the learning rate
      6     lr_scheduler.step()

2 frames
[/content/drive/MyDrive/PytorchObjectDetector/engine.py](https://localhost:8080/#) in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq)
     30         print("######################",targets)
     31 
---> 32         loss_dict = model(images, targets)
     33 
     34         losses = sum(loss for loss in loss_dict.values())

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py](https://localhost:8080/#) in forward(self, images, targets)
     89                     degen_bb: List[float] = boxes[bb_idx].tolist()
     90                     raise ValueError(
---> 91                         "All bounding boxes should have positive height and width."
     92                         f" Found invalid box {degen_bb} for target at index {target_idx}."
     93                     )

ValueError: All bounding boxes should have positive height and width. Found invalid box [139.397216796875, 581.7989501953125, 423.73980712890625, 431.1422119140625] for target at index 7.

I am sure, the problem has been addressed long back by looking at this responses given here

But I look at this post on stackoverflow suffering from same error ValueError: All bounding boxes should have positive height and width

Could any of you guide what exactly should be changed? and where it has to be changed?

I will surely write a medium blog on Pytorch Object Detection from custom data using Transfer Learning after I have sorted out these few minor hiccups

@fmassa I guess you could help me sort this issue out

abhi-glitchhg · 2022-04-29T06:40:54Z

Hey @santhoshnumberone ,
refer to @oke-aditya's comment here- #2740 (comment). The bounding boxes should be in form of (xmin, ymin, xmax, ymax).

In your bounding box data, there are few datapoints which do not fit the above format, some of them are -

tensor([[201., 838., 611., 621.]])
tensor([[488., 669., 774., 541.]])
tensor([[129., 242., 138., 119.]])
tensor([[319., 842., 384., 541.]])
tensor([[  0., 714., 585., 646.]])
tensor([[396., 723., 592., 627.]])
tensor([[156., 851., 598., 624.]])

so first you need to check the format of the bounding boxes that you have. You need to convert the bounding boxes to (xmin, ymin, xmax, ymax) format.
This function might be helpful for converting the bounding boxes.

vision/torchvision/ops/boxes.py

Lines 137 to 189 in a98e17e

    
           def box_convert(boxes: Tensor, in_fmt: str, out_fmt: str) -> Tensor: 
        
               """ 
        
               Converts boxes from given in_fmt to out_fmt. 
        
               Supported in_fmt and out_fmt are: 
        
               'xyxy': boxes are represented via corners, x1, y1 being top left and x2, y2 being bottom right. 
        
               'xywh' : boxes are represented via corner, width and height, x1, y2 being top left, w, h being width and height. 
        
               'cxcywh' : boxes are represented via centre, width and height, cx, cy being center of box, w, h 
        
               being width and height. 
        
               Arguments: 
        
                   boxes (Tensor[N, 4]): boxes which will be converted. 
        
                   in_fmt (str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']. 
        
                   out_fmt (str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'] 
        
               Returns: 
        
                   boxes (Tensor[N, 4]): Boxes into converted format. 
        
               """ 
        
               allowed_fmts = ("xyxy", "xywh", "cxcywh") 
        
               assert in_fmt in allowed_fmts 
        
               assert out_fmt in allowed_fmts 
        
               if in_fmt == out_fmt: 
        
                   boxes_converted = boxes.clone() 
        
                   return boxes_converted 
        
               if in_fmt != 'xyxy' and out_fmt != 'xyxy': 
        
                   if in_fmt == "xywh": 
        
                       boxes_xyxy = _box_xywh_to_xyxy(boxes) 
        
                       if out_fmt == "cxcywh": 
        
                           boxes_converted = _box_xyxy_to_cxcywh(boxes_xyxy) 
        
                   elif in_fmt == "cxcywh": 
        
                       boxes_xyxy = _box_cxcywh_to_xyxy(boxes) 
        
                       if out_fmt == "xywh": 
        
                           boxes_converted = _box_xyxy_to_xywh(boxes_xyxy) 
        
                   # convert one to xyxy and change either in_fmt or out_fmt to xyxy 
        
               else: 
        
                   if in_fmt == "xyxy": 
        
                       if out_fmt == "xywh": 
        
                           boxes_converted = _box_xyxy_to_xywh(boxes) 
        
                       elif out_fmt == "cxcywh": 
        
                           boxes_converted = _box_xyxy_to_cxcywh(boxes) 
        
                   elif out_fmt == "xyxy": 
        
                       if in_fmt == "xywh": 
        
                           boxes_converted = _box_xywh_to_xyxy(boxes) 
        
                       elif in_fmt == "cxcywh": 
        
                           boxes_converted = _box_cxcywh_to_xyxy(boxes) 
        
               return boxes_converted

I hope this helps.

oke-aditya · 2022-04-29T06:47:38Z

Also note that if you are trying to train an object detection model you should use

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

since mask_rcnn is an instance segmentation model which will expect segmentation mask during training.

santhoshnumberone · 2022-04-29T18:45:19Z

Hey @santhoshnumberone , refer to @oke-aditya's comment here- #2740 (comment). The bounding boxes should be in form of (xmin, ymin, xmax, ymax).

In your bounding box data, there are few datapoints which do not fit the above format, some of them are -
tensor([[201., 838., 611., 621.]])
tensor([[488., 669., 774., 541.]])
tensor([[129., 242., 138., 119.]])
tensor([[319., 842., 384., 541.]])
tensor([[  0., 714., 585., 646.]])
tensor([[396., 723., 592., 627.]])
tensor([[156., 851., 598., 624.]])
so first you need to check the format of the bounding boxes that you have. You need to convert the bounding boxes to (xmin, ymin, xmax, ymax) format. This function might be helpful for converting the bounding boxes.

vision/torchvision/ops/boxes.py

Lines 137 to 189 in a98e17e

def box_convert(boxes: Tensor, in_fmt: str, out_fmt: str) -> Tensor:

"""

Converts boxes from given in_fmt to out_fmt.

Supported in_fmt and out_fmt are:

'xyxy': boxes are represented via corners, x1, y1 being top left and x2, y2 being bottom right.

'xywh' : boxes are represented via corner, width and height, x1, y2 being top left, w, h being width and height.

'cxcywh' : boxes are represented via centre, width and height, cx, cy being center of box, w, h

being width and height.

Arguments:

boxes (Tensor[N, 4]): boxes which will be converted.

in_fmt (str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'].

out_fmt (str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']

Returns:

boxes (Tensor[N, 4]): Boxes into converted format.

"""

allowed_fmts = ("xyxy", "xywh", "cxcywh")

assert in_fmt in allowed_fmts

assert out_fmt in allowed_fmts

if in_fmt == out_fmt:

boxes_converted = boxes.clone()

return boxes_converted

if in_fmt != 'xyxy' and out_fmt != 'xyxy':

if in_fmt == "xywh":

boxes_xyxy = _box_xywh_to_xyxy(boxes)

if out_fmt == "cxcywh":

boxes_converted = _box_xyxy_to_cxcywh(boxes_xyxy)

elif in_fmt == "cxcywh":

boxes_xyxy = _box_cxcywh_to_xyxy(boxes)

if out_fmt == "xywh":

boxes_converted = _box_xyxy_to_xywh(boxes_xyxy)

# convert one to xyxy and change either in_fmt or out_fmt to xyxy

else:

if in_fmt == "xyxy":

if out_fmt == "xywh":

boxes_converted = _box_xyxy_to_xywh(boxes)

elif out_fmt == "cxcywh":

boxes_converted = _box_xyxy_to_cxcywh(boxes)

elif out_fmt == "xyxy":

if in_fmt == "xywh":

boxes_converted = _box_xywh_to_xyxy(boxes)

elif in_fmt == "cxcywh":

boxes_converted = _box_cxcywh_to_xyxy(boxes)

return boxes_converted

I hope this helps.

Thank you for highlighting the issue, will look into it.
I blindly trusted a popular online image labelling tool to annotate my custom data

santhoshnumberone · 2022-04-29T18:54:11Z

Also note that if you are trying to train an object detection model you should use
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
since mask_rcnn is an instance segmentation model which will expect segmentation mask during training.

Can't I freeze everything apart from object detection block using requires_grad = False and train it?

PS

Mask is required to calculate the loss I guess, I got his error

  cpuset_checked))
###################### [{'boxes': tensor([[132.,   0., 435., 285.]]), 'labels': tensor([1]), 'image_id': tensor([1889]), 'area': tensor([86355.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 640., 315.]]), 'labels': tensor([1]), 'image_id': tensor([1210]), 'area': tensor([127575.]), 'iscrowd': tensor([0])}]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-16-05e881bbc3b2>](https://localhost:8080/#) in <module>()
      2 for epoch in range(num_epochs):
      3     # train for one epoch, printing every 10 iterations
----> 4     train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10)
      5     # update the learning rate
      6     lr_scheduler.step()

6 frames
[/content/drive/MyDrive/PytorchObjectDetector/engine.py](https://localhost:8080/#) in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq)
     30         print("######################",targets)
     31 
---> 32         loss_dict = model(images, targets)
     33 
     34         losses = sum(loss for loss in loss_dict.values())

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py](https://localhost:8080/#) in forward(self, images, targets)
     97             features = OrderedDict([("0", features)])
     98         proposals, proposal_losses = self.rpn(images, features, targets)
---> 99         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
    100         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)  # type: ignore[operator]
    101 

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in forward(self, features, proposals, image_shapes, targets)
    743 
    744         if self.training:
--> 745             proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
    746         else:
    747             labels = None

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in select_training_samples(self, proposals, targets)
    628     ):
    629         # type: (...) -> Tuple[List[Tensor], List[Tensor], List[Tensor], List[Tensor]]
--> 630         self.check_targets(targets)
    631         assert targets is not None
    632         dtype = proposals[0].dtype

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in check_targets(self, targets)
    620         assert all(["labels" in t for t in targets])
    621         if self.has_mask():
--> 622             assert all(["masks" in t for t in targets])
    623 
    624     def select_training_samples(

AssertionError:

ihebchiha123 · 2024-05-13T09:19:30Z

I had the same problem, all the images and the masks were fine, for the image augmentation I was using this transforms :

from torchvision.transforms import v2 as T
def get_transform(train):

transforms = []
if train:
    transforms.append(T.RandomHorizontalFlip(0.2))
    #transforms.append(T.RandomRotation(10))
transforms.append(T.ToDtype(torch.float, scale=True))
transforms.append(T.ToPureTensor())
return T.Compose(transforms)

when "transforms.append(T.RandomRotation(10))" was uncommented, i had an error when i start the training, but when I commented that line the training step was successfully done.

fmassa closed this as completed Oct 2, 2020

fmassa added question topic: object detection labels Oct 2, 2020

fmassa mentioned this issue Oct 2, 2020

"This overload of nonzero is deprecated" #2154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0. #2740

ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0. #2740

kashf99 commented Oct 2, 2020 •

edited

Loading

oke-aditya commented Oct 2, 2020

fmassa commented Oct 2, 2020

kashf99 commented Oct 2, 2020

fmassa commented Oct 2, 2020

kashf99 commented Oct 2, 2020

kashf99 commented Oct 2, 2020

fmassa commented Oct 2, 2020

MALLI7622 commented Jan 13, 2021 •

edited

Loading

oke-aditya commented Jan 13, 2021

MALLI7622 commented Jan 13, 2021

oke-aditya commented Jan 13, 2021

MALLI7622 commented Jan 13, 2021

fmassa commented Jan 20, 2021

MALLI7622 commented Jan 21, 2021 •

edited

Loading

fmassa commented Jan 21, 2021

clothme-io commented Feb 10, 2021

MALLI7622 commented Feb 11, 2021

clothme-io commented Feb 12, 2021

OrielBanne commented Aug 1, 2021

fmassa commented Aug 13, 2021

mrinath123 commented Jan 5, 2022

Esraanageh22 commented Mar 18, 2022

santhoshnumberone commented Apr 29, 2022 •

edited

Loading

abhi-glitchhg commented Apr 29, 2022

oke-aditya commented Apr 29, 2022

santhoshnumberone commented Apr 29, 2022 •

edited

Loading

santhoshnumberone commented Apr 29, 2022 •

edited

Loading

ihebchiha123 commented May 13, 2024

ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0. #2740

ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0. #2740

Comments

kashf99 commented Oct 2, 2020 • edited Loading

oke-aditya commented Oct 2, 2020

fmassa commented Oct 2, 2020

kashf99 commented Oct 2, 2020

fmassa commented Oct 2, 2020

kashf99 commented Oct 2, 2020

kashf99 commented Oct 2, 2020

fmassa commented Oct 2, 2020

MALLI7622 commented Jan 13, 2021 • edited Loading

oke-aditya commented Jan 13, 2021

MALLI7622 commented Jan 13, 2021

oke-aditya commented Jan 13, 2021

MALLI7622 commented Jan 13, 2021

fmassa commented Jan 20, 2021

MALLI7622 commented Jan 21, 2021 • edited Loading

fmassa commented Jan 21, 2021

clothme-io commented Feb 10, 2021

MALLI7622 commented Feb 11, 2021

clothme-io commented Feb 12, 2021

OrielBanne commented Aug 1, 2021

For training

For inference

optionally, if you want to export the model to ONNX:

fmassa commented Aug 13, 2021

mrinath123 commented Jan 5, 2022

Esraanageh22 commented Mar 18, 2022

santhoshnumberone commented Apr 29, 2022 • edited Loading

abhi-glitchhg commented Apr 29, 2022

oke-aditya commented Apr 29, 2022

santhoshnumberone commented Apr 29, 2022 • edited Loading

santhoshnumberone commented Apr 29, 2022 • edited Loading

PS

ihebchiha123 commented May 13, 2024

kashf99 commented Oct 2, 2020 •

edited

Loading

MALLI7622 commented Jan 13, 2021 •

edited

Loading

MALLI7622 commented Jan 21, 2021 •

edited

Loading

santhoshnumberone commented Apr 29, 2022 •

edited

Loading

santhoshnumberone commented Apr 29, 2022 •

edited

Loading

santhoshnumberone commented Apr 29, 2022 •

edited

Loading