what can I do for RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047] #1688

Jackyinuo · 2020-12-14T11:25:56Z

❔Question

Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
 0/299     5.73G   0.09355   0.08561   0.08285     0.262       154       640: 100%|██████████| 3697/3697 [37:33<00:00,  1.64it/s] 
           Class      Images     Targets           P           R      [email protected]  [email protected]:.95: 100%|██████████| 157/157 [01:31<00:00,  1.71it/s]

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size
 1/299     6.98G   0.09522    0.1221   0.08584    0.3032       247       640: 100%|██████████| 3697/3697 [35:22<00:00,  1.74it/s]
           Class      Images     Targets           P           R      [email protected]  [email protected]:.95:   0%|          | 0/157 [00:00<?, ?it/s]

Analyzing anchors... anchors/target = 4.45, Best Possible Recall (BPR) = 0.9949
all 5e+03 3.63e+04 0.0145 0.00296 0.00248 0.000805
Traceback (most recent call last):
File "train.py", line 503, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 336, in train
results, maps, times = test.test(opt.data,
File "/disk1/huihui/yolov5/test.py", line 120, in test
output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, labels=lb)
File "/disk1/huihui/yolov5/utils/general.py", line 332, in non_max_suppression
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
File "/home/phzhou/anaconda3/envs/pt1/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 42, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047]

Additional context

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2020-12-14T18:59:21Z

@Jackyinuo that's very strange. You may have an environment problem, I would try to reproduce your error in a verified working environment like Google Colab or our Docker image, and if the error appears there then please raise a full bug report here. I'll post you our default reply below.

Hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

Your modified or out-of-date code. If your issue is not reproducible in a new git clone version of this repo we can not debug it. Before going further run this code and verify your issue persists:

$ git clone https://github.com/ultralytics/yolov5 yolov5_new  # clone latest
$ cd yolov5_new
$ python detect.py  # verify detection

# CODE TO REPRODUCE YOUR ISSUE HERE

Your custom data. If your issue is not reproducible in one of our 3 common datasets (COCO, COCO128, or VOC) we can not debug it. Visit our Custom Training Tutorial for guidelines on training your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of your labels and images.
Your environment. If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, verify your environment meets all of the requirements.txt dependencies specified below. If in doubt, download Python 3.8.0 from https://www.python.org/, create a new venv, and then install requirements.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/models/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

zhiqwang · 2020-12-22T12:32:29Z

I think this is a bug of nms, refer to pytorch/vision#1705 here.

github-actions · 2021-01-22T01:21:03Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tihhanovski · 2024-02-26T13:05:51Z

btw, I have similar problem with custom dataset with YOLOv8
Got this error every time I used Apple M1 GPU: yolo detect train data=tools_v2.yaml model=yolov8n.pt epochs=100 imgsz=640 device=mps
Training was successfull when I did not used GPU: yolo detect train data=tools_v2.yaml model=yolov8n.pt epochs=100 imgsz=640

glenn-jocher · 2024-02-26T18:23:54Z

@tihhanovski it seems like you're encountering an issue that might be related to the interaction between PyTorch's NMS implementation and the MPS backend on Apple's M1 GPU. The error you're experiencing with YOLOv5 and a similar issue with YOLOv8 suggest that this could be a broader compatibility problem with MPS.

Given the reference to a PyTorch/Vision issue, it's possible that the problem lies within the underlying library rather than YOLOv5 or YOLOv8 directly. However, ensuring that you're using the latest versions of PyTorch and torchvision that support the MPS backend could potentially resolve this issue. Apple's M1 GPUs have specific requirements, and compatibility is continually improving.

For now, as a workaround, training without the GPU on the M1 (as you've done successfully) is a valid approach, albeit slower. You might also consider running your training on a different machine with a more widely supported GPU architecture (e.g., NVIDIA's CUDA) if that's an option for you.

We appreciate your patience and understanding as these compatibility issues are worked out. The rapid development of machine learning frameworks and hardware often leads to these kinds of challenges, but they are usually resolved with time as updates are released. Keep an eye on updates from PyTorch and torchvision that might address this issue more directly.

If you haven't already, please ensure your environment is up to date with the latest versions of all relevant libraries. If the problem persists, consider raising an issue on the PyTorch GitHub to bring more attention to MPS backend compatibility problems. Your detailed feedback can help the developers prioritize and address these issues more effectively.

Thank you for your contribution to the community by highlighting this issue. Your efforts help improve the tool for everyone. 🙏

Chase-Nicholas · 2024-08-02T22:11:26Z

I had a similar issue while training on a single image. After I added more than one image, training on my M1 GPU worked.

glenn-jocher · 2024-08-03T00:09:04Z

Thank you for sharing your experience! It's interesting to hear that adding more images resolved the issue on your M1 GPU. This suggests that the problem might be related to how the MPS backend handles certain operations with very small datasets.

For anyone encountering similar issues, here are a few additional tips that might help:

Update Your Environment: Ensure you are using the latest versions of PyTorch and torchvision, as updates often include bug fixes and improvements for compatibility with different hardware, including Apple's M1 GPUs.
Batch Size and Dataset Size: As you've noted, increasing the number of images in your dataset can sometimes resolve unexpected issues. This might be due to how certain operations are optimized for larger batches or datasets.
Alternative Backends: If you continue to experience issues with the MPS backend, consider using CPU for training, as it appears to work without issues. Alternatively, if you have access to a machine with an NVIDIA GPU, using CUDA is another reliable option.
Community and Documentation: Keep an eye on the PyTorch GitHub issues and discussions for updates and potential fixes related to the MPS backend.

Here's a small code snippet to ensure you're using the latest versions of PyTorch and torchvision:

pip install --upgrade torch torchvision

We appreciate your patience and contributions to improving the YOLOv5 experience for everyone. If you encounter further issues or have more insights to share, please feel free to continue the discussion here. Your feedback is invaluable to the community! 😊

Thank you again, and happy training! 🚀

Jackyinuo added the question Further information is requested label Dec 14, 2020

Jackyinuo changed the title ~~RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047]~~ what can I do for RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047] Dec 14, 2020

github-actions bot added the Stale Stale and schedule for closing soon label Jan 22, 2021

github-actions bot closed this as completed Jan 28, 2021

kolbjornkelly mentioned this issue Apr 8, 2021

IMPORTANT: Fix evaluation error kolbjornkelly/tdt4265-computer-vision-and-deep-learning#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what can I do for RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047] #1688

what can I do for RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047] #1688

Jackyinuo commented Dec 14, 2020

glenn-jocher commented Dec 14, 2020 •

edited by UltralyticsAssistant

Loading

zhiqwang commented Dec 22, 2020

github-actions bot commented Jan 22, 2021

tihhanovski commented Feb 26, 2024

glenn-jocher commented Feb 26, 2024

Chase-Nicholas commented Aug 2, 2024

glenn-jocher commented Aug 3, 2024

what can I do for RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047] #1688

what can I do for RuntimeError: Trying to create tensor with negative dimension -1592267047: [-1592267047] #1688

Comments

Jackyinuo commented Dec 14, 2020

❔Question

Additional context

glenn-jocher commented Dec 14, 2020 • edited by UltralyticsAssistant Loading

Requirements

Environments

Status

zhiqwang commented Dec 22, 2020

github-actions bot commented Jan 22, 2021

tihhanovski commented Feb 26, 2024

glenn-jocher commented Feb 26, 2024

Chase-Nicholas commented Aug 2, 2024

glenn-jocher commented Aug 3, 2024

glenn-jocher commented Dec 14, 2020 •

edited by UltralyticsAssistant

Loading