Added Benchmarking CI with --hard-fail flag #7915

JWLee89 · 2022-05-21T11:04:04Z

This PR is motivated by a quote in the following issue (#7870).

We need to add more detailed export and benchmarking CI (python utils/benchmarks.py) as well in a separate action since this take longer.

For now, this PR does not incorporate separate actions and benchmarking to keep the PR compact, since adding all of these tests in a single PR will result in major changes, making it difficult to review.

Question regarding separate CI working for exports and benchmarking:

Should the CI workflows be run on all three OS (windows, linux, mac)? The reason why I ask is that as far as I know, TensorRT does not play well with Mac OS. Tests may be easier if we target a platform such as ubuntu 18.04 or 20.04 and run the CI workflow inside of a docker container.

Changes

Updated ci-testing.yml.
- Replaced the following: python export.py --weights ${{ matrix.model }}.pt --img 64 --include torchscript onnx # export with pytest.
Added test_export.py
- Test to see if the given export format results in actual file serialization. Will automatically fail if the output file / folder does not exist.
- For folders, it will check inside folder to ensure that generated files / subfolders are not empty
- Cleanup file system after running tests (as exported models are serialized to local file system)
Added pytest to requirements.txt
Addedconftest.py to accept pytest command line argument: --weights (the path to PyTorch .pt file)

Future Works

Inference benchmarking (assert that all model formats output similar mAP on VOC validation set)
Running github workflow on docker
- This will enable testing using TensorRT, provided that we have access to self-hosted GPU server in CI pipeline
Add separate CI workflow for exports and benchmarking.

Discovered issues

tensorflowjs and OpenVINO do not play well together simultaneously due to OpenVINO supporting numpy ver < 1.20. tensorflowjs on the other hand, does not work with numpy ver < 1.20.
The lazy pip installation feature via check_requirements(('tensorflowjs', )) sometimes fails when installing tensorflowjs. The root cause is currently unknown, but I believe that we can reproduce it by following the github workflow inside of a docker container.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced benchmark integrity and added benchmark validation for model exports in YOLOv5.

📊 Key Changes

Added a --hard-fail option to the benchmarking script to enforce stricter validation.
Benchmarks now include a reference to mAP (mean Average Precision) values in the coco128.yaml file for different models and image sizes.
New functionalities in benchmarks.py check that exported models meet predetermined performance thresholds.
Enhanced error handling to provide clearer feedback during the benchmarking process.

🎯 Purpose & Impact

🎯 Ensures that model performance remains above defined thresholds, increasing confidence in model reliability.
🎯 Automates validation checks for model exports, reducing the potential for human error and improving workflow efficiency.
🎯 Provides a reference for expected model performance, which developers and users can use as a benchmark for their applications.
🎯 Enhances the robustness of benchmark testing, helping to catch regressions or issues following changes to the codebase.

The changes primarily impact developers conducting benchmarks for YOLOv5 models, but they also benefit users by ensuring high-quality model performance standards.

for more information, see https://pre-commit.ci

…INO and Tensorflowjs

glenn-jocher · 2022-05-22T14:28:33Z

@JWLee89 thanks for the PR. We already have most export verification functionality included in utils/benchmarks.py, i.e. see #6613. This file runs all possible exports and checks export success and mAP and speeds if exports succeeded.

We want to use this existing code as the basis for any CI updates, by just making a few small changes to create hard rather than soft failures, i.e. python utils/benchmarks --hard-fail, and to also assert output mAPs above threshold.

Colab Pro+ High-RAM CPU Results

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=cpu, half=False, test=False
Checking setup...
YOLOv5 🚀 v6.1-135-g7926afc torch 1.10.0+cu111 CPU
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 41.5/166.8 GB disk)

Benchmarks complete (241.20s)
                   Format  [email protected]:0.95  Inference time (ms)
0                 PyTorch        0.4623               127.61
1             TorchScript        0.4623               131.23
2                    ONNX        0.4623                69.34
3                OpenVINO        0.4623                66.52
4                TensorRT           NaN                  NaN
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623               123.79
7     TensorFlow GraphDef        0.4623               121.57
8         TensorFlow Lite        0.4623               316.61
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

glenn-jocher · 2022-05-22T14:32:04Z

About deleting exported models, I'm not clear on the reason you want to delete them, but if you want to you can simply point utils/benchmarks.py --weights to a new directory. The PyTorch model automatically downloads to this directory and all exports are located alongside it. After CI you can simply delete the directory, i.e.:

python utils/benchmarks.py --weights weights/yolov5n.pt
rm -rf weights/

JWLee89 · 2022-05-23T05:14:24Z

@glenn-jocher

@JWLee89 thanks for the PR. We already have most export verification functionality included in utils/benchmarks.py, i.e. see #6613. This file runs all possible exports and checks export success and mAP and speeds if exports succeeded.

We want to use this existing code as the basis for any CI updates, by just making a few small changes to create hard rather than soft failures, i.e. python utils/benchmarks --hard-fail, and to also assert output mAPs above threshold.

Thank you for the clarification. I will update this PR to match the specifications by modifying utils/benchmark.py and adding it to the CI workflow.

I just ran the test on my local device and it seems that this does take some time to run, so I am planning on creating a separate workflow ci-benchmarks.yml so that it runs separately to ci-testing.yml workflow. Would this be okay?

I will ping you when the updated code is ready for review.

About deleting exported models, I'm not clear on the reason you want to delete them, but if you want to you can simply point utils/benchmarks.py --weights to a new directory. The PyTorch model automatically downloads to this directory and all exports are located alongside it. After CI you can simply delete the directory, i.e.:

I wanted to delete exported models if in the case that we run the CI workflow on a self-hosted server, we don't leave dangling files on the server's local filesystem.

Some questions

Regarding the threshold mAP values, I think that instead of hardcoding them, since they are data-related, i thought adding benchmarks properties under data/<dataset-name>.yaml (i.e. data/coco128.yaml) would be appropriate (please see yaml file below). Would this be okay?

# Download script/URL (optional)
download: https://ultralytics.com/assets/coco128.zip

# Benchmark values
benchmarks:
  # metrics E.g. mAP
  mAP:
    # Img size 640
    yolov5n: 0.45
    yolov5s: 0.45
    yolov5m: 0.45
    yolov5l: 0.45
    yolov5x: 0.45

    # Img size 1280
    yolov5n6: 0.5
    yolov5s6: 0.5
    yolov5m6: 0.6
    yolov5l6: 0.6
    yolov5x6: 0.6

For now, If the benchmark property does not exist under the yaml file, we will skip the assertions.

For now, I added arbitrary threshold values to the yaml file, but I was wondering if there are official threshold values (for each model type) that you have in mind. If so, please let me know and I will add them in this PR.
Could the benchmark tests be extended to other datasets? e.g. VOC. If so, please let me know if you have any benchmark threshold values in mind.

JWLee89 · 2022-05-23T08:43:57Z

@glenn-jocher I have added the assertions to check whether mAP lies above a certain threshold. The test is running on CI-workflow called ci-benchmarking.yml. To run it locally, type the following:

python utils/benchmarks.py --weights ${{ matrix.model }}.pt --hard-fail

Summary on implementation details

Threshold values stored within data/coco128.yaml since threshold values / selected evaluation metrics may vary based on datasets.
Workflow CI Benchmarking runs benchmarking on yolov5n and yolov5s
- Runs on ubuntu-latest
- Threshold value provided right now is an arbitrary value of 0.45 for yolov5n and yolov5s.
Added --hard-fail flag. If specified, it will raise error if any error occurs and if mAP is below certain threshold. Implementation-wise, the for loop is not encased within a try-except block.
- Note that CI will fail if the given models fail to export. Some models that require specific OS or environments (e.g. edgetpu) are omitted from the check.
Logging pandas DataFrame fails on CI, so workaround pretty-printing added.
Omitted the following formats:
- coreml: Inference only supported on MacOS
- edgetpu, tfjs: Previously omitted. Edgetpu requires special device such as coral board or tpu to run inference
- TensorRT: we need to run it inside of docker on a dedicated host with cuda. Otherwise, we get the following error

TensorRT: export failure: export running on CPU but must be on GPU, i.e. `python export.py --device 0`
Traceback (most recent call last):
  File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 222, in <module>
    main(opt)
  File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 217, in main
    test(**vars(opt)) if opt.test else run(**vars(opt))
  File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 136, in run
    benchmarks = get_benchmark_values(name, f, suffix, gpu, weights, data, imgsz, half, batch_size, device)
  File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 104, in get_benchmark_values
    w = export.run(weights=weights, imgsz=[imgsz], include=[f], device=device, half=half)[-1]  # all others
IndexError: list index out of range

mAP threshold check can be seen below. Applied a threshold value of 0.8 to show that it raises the appropriate error when the model does not meet threshold requirements.

…d assertions

…s and other human error

…5 : 0.95 threshold to 0.35

JWLee89 · 2022-05-25T10:15:38Z

@glenn-jocher When you have some time to spare, could you please take a look at the PR?

I noticed that in #6613, the benchmarks are performed using yolov5s. Right now, the automation for model export failure and mAP threshold check is being done for both yolov5s and yolov5n. If we want to run the check for all available yolo models, we can have two separate CI workflow for s6 (input size 1280) and s (input size 640) series.

For additional information, please feel free to ask. Details are also recorded in my previous comments.

Thank you for taking time out of your schedule to take a look at the PR.

glenn-jocher · 2022-05-26T21:40:29Z

@JWLee89 yes I've got this as a TODO, will get to it soon!

JWLee89 · 2022-05-27T02:04:25Z

@JWLee89 yes I've got this as a TODO, will get to it soon!

@glenn-jocher Thank you for letting me know!

glenn-jocher · 2022-05-27T09:57:14Z

@JWLee89 I've added benchmarking by default to all CI runs now in #7996:

yolov5/.github/workflows/ci-testing.yml

Lines 40 to 43 in 09ba6f6

    
                 - name: Run benchmarks 
        
                   run: | 
        
                     python utils/benchmarks.py --weights ${{ matrix.model }}.pt --img 320

This runs YOLOv5n at 320 and should produce these results:
https://github.com/ultralytics/yolov5/runs/6623481183?check_suite_focus=true

JWLee89 · 2022-05-27T10:11:09Z

@glenn-jocher Thank you for the follow up!

I noticed that you removed the --hard-fail flag. Is this intentional? Just wanted to double-check. If the --hard-fail feature is not what you had in mind, please let me know and I will remove / update the code accordingly.

And right now, the CI benchmark testing is failing, because the mAP 0.5:0.95 for yolov5n is lower than the currently specified lower threshold value of 0.32.

File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 166, in run
    raise ThresholdError(f'mAP value: {mAP} is below threshold value: {map_benchmark_threshold}')
__main__.ThresholdError: mAP value: 0.2927 is below threshold value: 0.32

Is there any specific threshold value that you have in mind for yolov5n?

If the mAP threshold check feature is not needed, I can also remove / comment out the code from this PR. Let me know.

Thanks!

github-actions · 2022-06-30T00:21:43Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐.

github-actions · 2023-03-22T00:21:11Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐.

github-actions · 2023-10-03T00:21:27Z

👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap.

We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

glenn-jocher · 2023-11-14T17:42:58Z

@JWLee89 The --hard-fail flag was intentionally removed for simpler and consistent CI results, favoring the existing behavior to maintain consistent user experience.

Regarding the mAP threshold issue for yolov5n, it seems the specified lower threshold value of 0.32 might be too high. I currently do not have a specific threshold value in mind for yolov5n. If the mAP threshold check feature is not essential, feel free to remove or comment out the code from the PR.

Your efforts to improve the benchmarking process are valuable, and I appreciate your attention to detail.

JWLee89 · 2023-11-15T04:17:57Z

@JWLee89 The --hard-fail flag was intentionally removed for simpler and consistent CI results, favoring the existing behavior to maintain consistent user experience.

Regarding the mAP threshold issue for yolov5n, it seems the specified lower threshold value of 0.32 might be too high. I currently do not have a specific threshold value in mind for yolov5n. If the mAP threshold check feature is not essential, feel free to remove or comment out the code from the PR.

Your efforts to improve the benchmarking process are valuable, and I appreciate your attention to detail.

@glenn-jocher Thank you for the update and not a problem. If there is some other high-priority / useful items that needs to be worked on, I am more than happy to work on some of these items.

Since I have not been actively working on Yolo for a while, I might need a day or two to read through the project / source code again to get myself re-acquainted.

glenn-jocher · 2023-11-15T18:08:23Z

@JWLee89 Your willingness to contribute is greatly appreciated! Re-familiarizing yourself with the project sounds like a solid plan. Feel free to reach out if you have any questions or need assistance with anything. Your valuable insights and contributions are always welcome. Thank you for your support!

JWLee89 and others added 3 commits May 21, 2022 20:00

Added automated basic model export to CI

fb66000

Flake8 fix

4bcb039

[pre-commit.ci] auto fixes from pre-commit.com hooks

cda6cc3

for more information, see https://pre-commit.ci

JWLee89 force-pushed the export-ci branch from f528620 to 3fdeecc Compare May 21, 2022 11:42

Merge dependency installation into previous workflow step

8756d6a

JWLee89 force-pushed the export-ci branch 3 times, most recently from fc9f15a to ff99d04 Compare May 22, 2022 07:53

commented out openvino test due to compatibility issues between OpenV…

b2f658b

…INO and Tensorflowjs

JWLee89 force-pushed the export-ci branch from 5c9a7e2 to b2f658b Compare May 22, 2022 07:55

Revert all changes made in current branch

7525215

JWLee89 force-pushed the export-ci branch 3 times, most recently from 53c6f08 to ca3d4aa Compare May 23, 2022 09:07

Added hard-fail support to enforce hard-failure mode and mAP threshol…

e9343fa

…d assertions

JWLee89 force-pushed the export-ci branch from d88ce3e to 593270f Compare May 23, 2022 09:27

JWLee89 changed the title ~~Added automated basic model export to CI~~ Added Benchmarking CI with --hard-fail flag May 23, 2022

JWLee89 force-pushed the export-ci branch 2 times, most recently from d1145dd to ea77cc2 Compare May 23, 2022 12:55

Added comments for unsupported formats

de8b35b

JWLee89 force-pushed the export-ci branch 2 times, most recently from 1c854b9 to 3fc7703 Compare May 24, 2022 05:08

Added sanity check for unsupported formats to minimize bugs from typo…

987cc3d

…s and other human error

JWLee89 force-pushed the export-ci branch from 0be9166 to 987cc3d Compare May 24, 2022 05:21

Minor cleanup

2d7277e

glenn-jocher assigned JWLee89 May 24, 2022

glenn-jocher and others added 4 commits May 24, 2022 21:11

Merge branch 'master' into export-ci

6d9a78a

Added weights argument to ci-benchmarking.yml. Changed yolov5n mAP 0.…

4f07625

…5 : 0.95 threshold to 0.35

Merge remote-tracking branch 'origin/export-ci' into export-ci

a7b09f1

Lowered mAP threshold to 0.32 to pass CI test

3bc0008

Merge branch 'master' into export-ci

2d7fa32

glenn-jocher added 3 commits May 27, 2022 11:53

Delete ci-benchmarking.yml

6e3eb86

Merge branch 'master' into export-ci

8a48f3d

Update ci-testing.yml

6b1e90a

glenn-jocher added 3 commits May 27, 2022 16:54

Merge branch 'master' into export-ci

30400fb

Merge branch 'master' into export-ci

0a74d49

Merge branch 'master' into export-ci

cdcaf42

JWLee89 force-pushed the export-ci branch from 4b8419e to cdcaf42 Compare May 30, 2022 14:47

github-actions bot added the Stale Stale and schedule for closing soon label Jun 30, 2022

github-actions bot removed the Stale Stale and schedule for closing soon label Jul 29, 2022

glenn-jocher removed the TODO High priority items label Jul 31, 2022

github-actions bot added the Stale Stale and schedule for closing soon label Mar 22, 2023

github-actions bot removed the Stale Stale and schedule for closing soon label Apr 10, 2023

github-actions bot added the Stale Stale and schedule for closing soon label Oct 3, 2023

github-actions bot closed this Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Benchmarking CI with --hard-fail flag #7915

Added Benchmarking CI with --hard-fail flag #7915

JWLee89 commented May 21, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented May 22, 2022

glenn-jocher commented May 22, 2022 •

edited

Loading

JWLee89 commented May 23, 2022 •

edited

Loading

JWLee89 commented May 23, 2022 •

edited

Loading

JWLee89 commented May 25, 2022 •

edited

Loading

glenn-jocher commented May 26, 2022

JWLee89 commented May 27, 2022

glenn-jocher commented May 27, 2022

JWLee89 commented May 27, 2022 •

edited

Loading

github-actions bot commented Jun 30, 2022

github-actions bot commented Mar 22, 2023

github-actions bot commented Oct 3, 2023

glenn-jocher commented Nov 14, 2023

JWLee89 commented Nov 15, 2023

glenn-jocher commented Nov 15, 2023

Added Benchmarking CI with --hard-fail flag #7915

Added Benchmarking CI with --hard-fail flag #7915

Conversation

JWLee89 commented May 21, 2022 • edited by UltralyticsAssistant Loading

Changes

Future Works

Discovered issues

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented May 22, 2022

Colab Pro+ High-RAM CPU Results

glenn-jocher commented May 22, 2022 • edited Loading

JWLee89 commented May 23, 2022 • edited Loading

Some questions

JWLee89 commented May 23, 2022 • edited Loading

JWLee89 commented May 25, 2022 • edited Loading

glenn-jocher commented May 26, 2022

JWLee89 commented May 27, 2022

glenn-jocher commented May 27, 2022

JWLee89 commented May 27, 2022 • edited Loading

github-actions bot commented Jun 30, 2022

github-actions bot commented Mar 22, 2023

github-actions bot commented Oct 3, 2023

glenn-jocher commented Nov 14, 2023

JWLee89 commented Nov 15, 2023

glenn-jocher commented Nov 15, 2023

JWLee89 commented May 21, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented May 22, 2022 •

edited

Loading

JWLee89 commented May 23, 2022 •

edited

Loading

JWLee89 commented May 23, 2022 •

edited

Loading

JWLee89 commented May 25, 2022 •

edited

Loading

JWLee89 commented May 27, 2022 •

edited

Loading