Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming export for YOLO and COCO formats #9084

Merged
merged 36 commits into from
Mar 14, 2025
Merged

Conversation

Eldies
Copy link
Contributor

@Eldies Eldies commented Feb 10, 2025

Motivation and context

Depends on cvat-ai/datumaro#81, cvat-ai/datumaro#90, cvat-ai/datumaro#94, #9209

When we export a dataset, all annotations are kept in RAM. It may be a problem if they are large.

  • Added support for dataset streaming export for task or jobs
  • Switched to streaming in COCO and YOLO export formats

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

@Eldies Eldies force-pushed the dl/stream-export branch 6 times, most recently from bdda455 to fa321ce Compare February 12, 2025 06:37
@Eldies Eldies changed the base branch from develop to dl/update-datumaro February 12, 2025 06:38
@codecov-commenter
Copy link

codecov-commenter commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 97.36842% with 1 line in your changes missing coverage. Please review.

Project coverage is 73.40%. Comparing base (39bcc6a) to head (6779f37).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9084      +/-   ##
===========================================
+ Coverage    73.38%   73.40%   +0.02%     
===========================================
  Files          450      450              
  Lines        45847    45866      +19     
  Branches      3917     3917              
===========================================
+ Hits         33645    33670      +25     
+ Misses       12202    12196       -6     
Components Coverage Δ
cvat-ui 77.12% <ø> (+0.03%) ⬆️
cvat-server 70.43% <97.36%> (+0.01%) ⬆️
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

# Conflicts:
#	cvat/requirements/base.in
#	cvat/requirements/base.txt
Base automatically changed from dl/update-datumaro to develop March 4, 2025 14:54
Eldies added 4 commits March 4, 2025 16:09
# Conflicts:
#	cvat/apps/dataset_manager/bindings.py
#	cvat/apps/dataset_manager/formats/coco.py
#	cvat/apps/dataset_manager/formats/imagenet.py
#	cvat/apps/dataset_manager/formats/yolo.py
#	cvat/requirements/base.in
#	cvat/requirements/base.txt
@Eldies Eldies requested a review from nmanovic as a code owner March 4, 2025 20:56
@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Mar 6, 2025

I've noticed export performance degradation in streaming mode for COCO formats, about 2-3x in my cases. As I understand it, the problem is in coco/exporter.py:142, where the regular json library is used instead of the optimized orjson. I suppose orjson isn't compatible directly with json stream used in the implementation now, but I could find that there is other relevant functionality in orjson - https://github.com/ijl/orjson?tab=readme-ov-file#fragment. Please check this issue.

Copy link

@zhiltsov-max zhiltsov-max changed the title streaming export Streaming export for YOLO and COCO formats Mar 14, 2025
@zhiltsov-max zhiltsov-max merged commit 0b619c8 into develop Mar 14, 2025
34 checks passed
@zhiltsov-max zhiltsov-max deleted the dl/stream-export branch March 14, 2025 15:29
@zhiltsov-max zhiltsov-max mentioned this pull request Mar 14, 2025
6 tasks
@cvat-bot cvat-bot bot mentioned this pull request Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants