Feat/add syracuse qdr retriever #1

george42-ctds · 2024-05-30T22:13:32Z

JIRA ticket: HP-1459

New Features

Add Syracuse QDR retriever function

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

george42-ctds · 2024-06-03T18:53:51Z

We should merge PR #2, with Github actions, before merging this PR. Moving back to draft for now.

mfshao · 2024-06-06T16:16:02Z

heal/qdr_downloads.py

+        Dict of download status
+    """
+
+    if not Path(download_path).exists():


nitpick: would it be better to default to . if download_path is not provided by the user?

mfshao · 2024-06-06T16:21:56Z

heal/qdr_downloads.py

+        # unpack the zip file
+        try:
+            logger.debug(f"Ready to unpack {filepath}.")
+            unpackage_object(filepath=filepath)


So we are zipping all the files we are intended to download into a zip and then unzip it at the end? I'm curious why are we doing this, like what is the benefit of this solution vs downloading each individual file separatly and put them into a dedicated directory?

The QDR provides zip files for the GET:study_id and POST:bulk file_ids. We can also get single files via a GET request that is not a zip file. I can add a flag for downloading file_ids in bulk (zip) or one at a time (not zipped).

For the latter, should we rename the file after download? For example, if we download file_id=45139 should we rename the downloaded file from ‘45139’ to ‘Baum-et-al_INTERVIEW_GUIDE.pdf’? Or just log the filename at download?

I think we should not use the bulk endpoint for downloading using file IDs, I remember they said there are some size limitations on that endpoint. Even though we are very unlikely to hit by that limitation with the current HEAL studies they have on QDR, I think it is still better user experience to download them one by one by using their file IDs. So we don't really need to worry about the bulk file download endpoint for now
But we should definitely rename each individual files using their original filenames

Refactored to download single files from the GET:datafile/{id} endpoint. The data get written into the filename that is listed in the Content-Disposition response header.

heal/qdr_downloads.py

github-actions · 2024-06-06T16:28:38Z

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

mfshao

looks good! 👍 great work

george42-ctds added 7 commits May 30, 2024 13:28

(HP-1459): update README and gitignore

5ffbfd3

(HP-1459): add pyproject.toml, poetry.lock

97027ef

(HP-1459): add qdr_downloads module

a623c4a

(HP-1459): add unit tests

78d4096

(HP-1459): add ci.yaml and ci_commands_script.sh

e3b26c2

(HP-1459): add image_build_push and wool

b30a114

(HP-1459): add secrets.baseline and pre-commit-config

9ddd867

george42-ctds marked this pull request as draft May 30, 2024 22:31

george42-ctds added 7 commits May 30, 2024 18:34

(HP-1459): expand documentation for qdr_download

6062069

(HP-1459): use streaming in download

5aac914

(HP-1459): clean up tests

eb17006

(HP-1459): clean up request_headers test

6d1a0ce

(HP-1459): clean up tests

c114766

(HP-1459): clean up comments

b1ce9e6

(HP-1459): add content to README

36201e3

george42-ctds marked this pull request as ready for review June 3, 2024 14:56

george42-ctds marked this pull request as draft June 3, 2024 18:52

george42-ctds and others added 2 commits June 3, 2024 13:13

Merge branch 'master' into feat/add-syracuse-qdr-retriever

9919689

(HP-1459): remove github action for 'image_build_push'

9246ae2

george42-ctds marked this pull request as ready for review June 3, 2024 21:09

mfshao reviewed Jun 6, 2024

View reviewed changes

Merge branch 'master' into feat/add-syracuse-qdr-retriever

a8cc7cc

george42-ctds added 4 commits June 13, 2024 09:37

(HP-1459): handle None download_url

b34a826

(HP-1459): allow downloading of single files, unzipped

b6e9f36

(HP-1459): get file_ids one at a time

1968fe6

(HP-1459): remove unused variable

c43d128

mfshao approved these changes Jun 14, 2024

View reviewed changes

(HP-1459): remove debug statements

e73e091

george42-ctds added 2 commits June 17, 2024 10:04

(HP-1459): add instructions for pip install from repo

2afcd86

(HP-1459): move 'pip install' section further down in README

061f62f

mfshao approved these changes Jun 18, 2024

View reviewed changes

mfshao merged commit 6ea6d75 into master Jun 18, 2024
7 checks passed

mfshao deleted the feat/add-syracuse-qdr-retriever branch June 18, 2024 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/add syracuse qdr retriever #1

Feat/add syracuse qdr retriever #1

george42-ctds commented May 30, 2024 •

edited by jira bot

Loading

george42-ctds commented Jun 3, 2024

mfshao Jun 6, 2024

mfshao Jun 6, 2024

george42-ctds Jun 13, 2024 •

edited

Loading

mfshao Jun 13, 2024

george42-ctds Jun 14, 2024

github-actions bot commented Jun 6, 2024

mfshao left a comment

Feat/add syracuse qdr retriever #1

Feat/add syracuse qdr retriever #1

Conversation

george42-ctds commented May 30, 2024 • edited by jira bot Loading

New Features

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

george42-ctds commented Jun 3, 2024

mfshao Jun 6, 2024

Choose a reason for hiding this comment

mfshao Jun 6, 2024

Choose a reason for hiding this comment

george42-ctds Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

mfshao Jun 13, 2024

Choose a reason for hiding this comment

george42-ctds Jun 14, 2024

Choose a reason for hiding this comment

github-actions bot commented Jun 6, 2024

mfshao left a comment

Choose a reason for hiding this comment

george42-ctds commented May 30, 2024 •

edited by jira bot

Loading

george42-ctds Jun 13, 2024 •

edited

Loading