Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract and display original document filenames #452

Merged
merged 4 commits into from
Jul 2, 2019
Merged

Extract and display original document filenames #452

merged 4 commits into from
Jul 2, 2019

Conversation

rmol
Copy link
Contributor

@rmol rmol commented Jun 27, 2019

Description

Files submitted to the SecureDrop server are gzipped with their
original name, then encrypted with GPG. When downloaded, we can
extract the original filename from the gzip header, so they can be
opened with an appropriate application.

This change adds the db.File.original_filename column, and populates
it when downloading and decrypting files. The file is still stored as
it has been, under the db.File.filename value stripped of extensions,
but when opened, that file is first hard linked to the original
filename.

Since we can't wait for the file to be opened (starting the disposable
VM is pretty slow), we can't immediately remove that link, so
storage.delete_single_submission_or_reply_on_disk now cleans it up if
the object being deleted has original_filename populated.

Fixes #163.

Test Plan

Check out this branch. Run the client. Upload a file with a .txt extension via the source interface, and confirm that when you refresh the client, click on the source, then click on the Download item, that the name changes to the name of the file as you uploaded it.

Then check that clicking on the file again opens it up in a suitable application in the disposable VM.

For bonus points, delete the source, or at least the submission, and confirm that upon refresh the data directory is properly cleaned up -- there should be no files with names matching the File's filename or original_filename.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs, network (via the RPC service) traffic, or fine tuning of the graphical user interface, Qubes testing is required. Please check as applicable:

  • I have tested these changes in Qubes
  • I do not have a Qubes OS workstation (the reviewer will need to test these changes in Qubes)

Files submitted to the SecureDrop server are gzipped with their
original name, then encrypted with GPG. When downloaded, we can
extract the original filename from the gzip header, so they can be
opened with an appropriate application.

This change adds the db.File.original_filename column, and populates
it when downloading and decrypting files. The file is still stored as
it has been, under the db.File.filename value stripped of extensions,
but when opened, that file is first hard linked to the original
filename.

Since we can't wait for the file to be opened (starting the disposable
VM is pretty slow), we can't immediately remove that link, so
storage.delete_single_submission_or_reply_on_disk now cleans it up if
the object being deleted has original_filename populated.
Copy link
Contributor

@redshiftzero redshiftzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two quick notes

@@ -166,7 +168,7 @@ def call_download_api(self, api: API, db_object: Reply) -> Tuple[str, str]:
sdk_object.source_uuid = db_object.source.uuid
return api.download_reply(sdk_object)

def call_decrypt(self, filepath: str, session: Session = None) -> None:
def call_decrypt(self, filepath: str, session: Session = None) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to update the comment for this method in the DownloadJob base class to explain what kind of string it should return since calling decrypt will return both the original filename and filepath strings. It would be helpful to know that we want to get the decrypted original filename so we can store it in the local db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Whatever we decide to return from decrypt_submission_or_reply should be explained here.

Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a few nits

@sssoleileraaa
Copy link
Contributor

Works as advertised, but one last question: Is there a reason to not show file extension? It seems like it would be more useful to show it so the journalist knows what they're downloading. An image, pdf, etc.

@sssoleileraaa
Copy link
Contributor

sssoleileraaa commented Jul 1, 2019

ah, I know you were unable to access zeplin before, but it looks like the prototypes show file extension (see below)

https://app.zeplin.io/project/5c807ea562f734bd2756b243/screen/5cd3b87518e9f734ae0218bb

@sssoleileraaa sssoleileraaa merged commit 642a8b2 into freedomofpress:master Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract and display filenames of downloaded documents
3 participants