Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to download export results #9075

Merged
merged 57 commits into from
Mar 13, 2025
Merged

Conversation

Marishka17
Copy link
Contributor

@Marishka17 Marishka17 commented Feb 7, 2025

Motivation and context

This PR introduces the following changes:

  • Several server API endpoints have been deprecated or no longer handle the process of exporting specific resources:
    • [API] GET /api/projects/id/dataset?action=import_status is deprecated
    • [API] GET /api/projects/id/dataset(no action parameter or action != import_status) returns 410 status code - API endpoint no longer used to handle export datasets flow
    • [API] GET /api/projects/id/annotations returns 410 status code - API endpoint no longer used to handle export annotations flow
    • [API] GET /api/projects/id/backup returns 410 status code - API endpoint no longer used to handle export backups flow
    • [API] GET /api/tasks/id/backup returns 410 status code - API endpoint no longer used to handle export backups flow
    • [API] GET /api/tasks/id/annotations?format= returns 410 status code - API endpoint no longer used to handle export annotations flow
    • [API] GET /api/tasks/d/dataset returns 410 status code - API endpoint no longer used to handle export datasets flow
    • [API] GET /api/jobs/id/annotations?format= returns 410 status code - API endpoint no longer used to handle export annotations flow
    • [API] GET /api/jobs/id/dataset returns 410 status code - API endpoint no longer used to handle export datasets flow
  • Introduced new "private" endpoints to download prepared files (they are hidden in the generated server schema):
    • [API] GET /api/projects/id/dataset/download?rq_id=rq_id
    • [API] GET /api/projects/id/backup/download?rq_id=rq_id
    • [API] GET /api/tasks/id/dataset/download?rq_id=rq_id
    • [API] GET /api/tasks/id/backup/download?rq_id=rq_id
    • [API] GET /api/jobs/id/dataset/download?rq_id=rq_id
  • Changed permissions used when downloading a prepared file (now admin and RQ job owner have rights to download a prepared file related to a specific background job (it's a questionable topic whether should be resource access checked also))
  • Added result filename saving to RQ metadata for export-related RQ jobs:
{
    ...
    "result_filename": str // is used as the final filename when downloading files/uploading to cloud storage
}
  • Final file name is taken from meta when uploading a file to cloud storage (key and key_pattern args have been removed)

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)

TODO:

  • Extract unrelated fixes/changes into separate PRs

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

@Marishka17
Copy link
Contributor Author

/check

@Marishka17
Copy link
Contributor Author

@zhiltsov-max, Could you please check that deprecated API is not used in Human Protocol oracles (and update if needed)?

"""),
parameters=[
# --- Deprecated params section ---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these parameters are removed, not just deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not remove these parameters from the specification because they are still used to determine whether a request is made to export the dataset or just to get raw annotations. So, I think it would be better to remove them from the schema once the endpoint no longer returns a 410 response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. By listing these parameters in the schema you are making the claim that they are supported, whereas in fact they are not. I can see the utility in keeping them around for a short while, so that people who read the schema can understand what happened to them; but in this case you should change the description to something accurate, e.g. "this parameter is no longer supported".

FWIW, I also think we should keep the logic that returns 410 if these parameters are present indefinitely, or at least for a long time. It costs very little, and it ensures that clients that use the old API get an error. OTOH, we probably don't need to keep these parameters in the schema for very long.

response = self.client.put(url, data=data)
return response

def _wait_request_to_be_finished(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd to see this here. IIRC, for unit tests we run RQ jobs in synchronous mode, so no waiting should be necessary - the job should be complete immediately after the initiating request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's true, but I believe there are still several advantages of adding a waiting step:

  1. We do not need to update related tests if we run these tests in asynchronous mode
  2. This way we test that /api/requests/rq_id API returns the expected code (200, 401, 403)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we run these tests in asynchronous mode

Is it possible?

This way we test that /api/requests/rq_id API returns the expected code (200, 401, 403)

I've nothing against calling this API, I just don't think we need to do it in a loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible?

Warum nicht?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warum nicht?

Well... how would you do it? AFAICS, cvat.settings.testing forces all queues to use sync mode. Moreover, it also uses a fake Redis implementation, so it wouldn't be possible to run RQ workers anyway.

@Marishka17 Marishka17 force-pushed the mk/api_to_download_export_results branch from 9418de4 to 4dee705 Compare March 12, 2025 10:49
@Marishka17 Marishka17 changed the title API to download export results [do not merge] API to download export results Mar 12, 2025
@Marishka17 Marishka17 changed the title [do not merge] API to download export results API to download export results Mar 12, 2025
@Marishka17 Marishka17 force-pushed the mk/api_to_download_export_results branch 2 times, most recently from 4ef5485 to 5d01bc9 Compare March 12, 2025 11:34
@Marishka17 Marishka17 force-pushed the mk/api_to_download_export_results branch from 5d01bc9 to 4b01053 Compare March 12, 2025 11:35
@SpecLad SpecLad merged commit 950f9b6 into develop Mar 13, 2025
34 checks passed
@SpecLad SpecLad deleted the mk/api_to_download_export_results branch March 13, 2025 13:50
@cvat-bot cvat-bot bot mentioned this pull request Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants