-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/add syracuse qdr retriever #1
Conversation
We should merge PR #2, with Github actions, before merging this PR. Moving back to draft for now. |
Dict of download status | ||
""" | ||
|
||
if not Path(download_path).exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: would it be better to default to .
if download_path
is not provided by the user?
heal/qdr_downloads.py
Outdated
# unpack the zip file | ||
try: | ||
logger.debug(f"Ready to unpack {filepath}.") | ||
unpackage_object(filepath=filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are zipping all the files we are intended to download into a zip and then unzip it at the end? I'm curious why are we doing this, like what is the benefit of this solution vs downloading each individual file separatly and put them into a dedicated directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The QDR provides zip files for the GET:study_id and POST:bulk file_ids. We can also get single files via a GET request that is not a zip file. I can add a flag for downloading file_ids in bulk (zip) or one at a time (not zipped).
For the latter, should we rename the file after download? For example, if we download file_id=45139 should we rename the downloaded file from ‘45139’ to ‘Baum-et-al_INTERVIEW_GUIDE.pdf’? Or just log the filename at download?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not use the bulk endpoint for downloading using file IDs, I remember they said there are some size limitations on that endpoint. Even though we are very unlikely to hit by that limitation with the current HEAL studies they have on QDR, I think it is still better user experience to download them one by one by using their file IDs. So we don't really need to worry about the bulk file download endpoint for now
But we should definitely rename each individual files using their original filenames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to download single files from the GET:datafile/{id} endpoint. The data get written into the filename that is listed in the Content-Disposition response header.
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! 👍 great work
JIRA ticket: HP-1459
New Features
Breaking Changes
Bug Fixes
Improvements
Dependency updates
Deployment changes