Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn or stop a source from submitting documents with compromising metadata #122

Closed
bxjx opened this issue Nov 7, 2013 · 10 comments
Closed

Comments

@bxjx
Copy link
Contributor

bxjx commented Nov 7, 2013

Uploading documents containing metadata may reveal information about the source. E.g. http://www.theguardian.com/world/2012/dec/04/john-mcafee-confirms-guatemala-vice

Metadata can be scrubbed by a journalist on the viewing station using the Metadata Anonymization Toolkit (MAT), but I think it would be better if the source was prevented, or at least warned, about submitting these documents in the first place. Ideally the source would use MAT, but this might be too much to expect.

Perhaps SecureDrop could have a configuration option to prevent the upload of any document except those that could be scanned for metadata on the client side?

I've written some code that scans PDFs before they are uploaded and asks the user to confirm that none of the metadata in the document compromises their identity. The code requires javascript but does not use any extensions or server communication. It does require a modern browser. It works on the version of Firefox included in the TOR Bundle. I think I could possibly write similar code for JPGs and possibly other formats.

On a related note, it might be worth preventing/warning the upload of certain document types that may contain hidden information other than metadata. E.g. Microsoft Word Documents may contain pass edits that can be reverted and therefore retrieved.

This issue is probably also related to #101 and #119.

@bxjx
Copy link
Contributor Author

bxjx commented Nov 7, 2013

Screenshot..
securedrop with metadata scanning

@dolanjs
Copy link
Contributor

dolanjs commented Nov 7, 2013

@bxjx as you noted we are working on issue #119 and we recommend for the journalist to scrub files prior to transfering the files off the secure viewing station for publication. At the same time we need to take into consideration that metadata can also have journalistic value (though we try to as clear as possible to the source about various threats including metadata).

@bxjx
Copy link
Contributor Author

bxjx commented Nov 7, 2013

Thanks @dolanjs, I think resolving #119 will really help!

I also take the point that if SecureDrop was to encourage the user to disable javascript, then it would be counterproductive to build functionality that requires it.

Asking the journalist to scrub the data does mean that there is an encrypted version of a potentially compromised file on the server until the journalist deletes it. Perhaps this is not a big issue.

I also do worry about sources having to read about and understand metadata and scrubbing rather than being prompted if the metadata can be detected.

@garrettr
Copy link
Contributor

garrettr commented Nov 7, 2013

@bxjx That is a really cool PoC! Will you share your metadata-detecting Javascript?

The proposed interface needs careful consideration. As @dolanjs points out, metadata can have journalistic value. We should also keep the submission process as simple as possible to encourage sources, and UX that involves popups or confirmations makes the UX more complex.

I could see this being useful either as an optional service (maybe a "screen my submission for metadata" button on the upload page) or as something on the journalist interface (to give them an overview, easier to use than a separate program like MAT but complementary in function). Ultimately I think the responsibility of handling metadata lies with the journalists.

I also do worry about sources having to read about and understand metadata and scrubbing rather than being prompted if the metadata can be detected.

Your UX is certainly nicer. Again, I think both the documentation on metadata and any metadata-detecting service should be optional.

@bxjx
Copy link
Contributor Author

bxjx commented Nov 7, 2013

@garrettr, it's on a branch at https://github.com/TheGlobalMail/securedrop/tree/metadata-scan. See TheGlobalMail@0c7d6d9. The UX is still pretty rough. Works on latest versions of Webkit, Firefox and IE10+.

The "screen my submission" is an interesting idea!

@psivesely
Copy link
Contributor

Our stance on JS in the source interface has not shifted in years, so I'm going to go ahead and close this one, even though it's a cool idea.

@redshiftzero
Copy link
Contributor

Does this need to be done in JS? What about having this done on the server side? I realize that this is not ideal, but until we have a magical browser extension that executes signed JS, it's probably better than just having sources submit documents with all kinds of metadata they might not even realize are there. App code examines the document, returns some useful feedback to source "Hey Col Biggins you might want to remove your name from the Author field" and let them strip off the metadata they don't want the journalist to know?

@psivesely
Copy link
Contributor

Processing documents adds greatly to the attack surface. Submissions may take hours to transfer, only to warn the source they may want to use some tool they've never heard and then re-submit. How do we distinguish useful metadata (e.g., DKIM on someone else's emails) vs harmful metadata (e.g., time data in JPG metadata in a photo taken covertly by the source)?

@redshiftzero
Copy link
Contributor

The processing you just described could be done on the server side no? It seems like a good first step would be just to display to the source "Here's the metadata your file has in it". Instead of referring them to another tool, we could just ask them to check the fields they'd like to be wiped. If we worry about handling every possible type of file then this could get really unwieldy, but if we stick to the most common files, e.g. PDFs, then this could be a very nice and useful feature for sources to understand what they are actually doing and protect their identity when they are leaking documents.

@psivesely
Copy link
Contributor

The processing I described is heavily problematic for the reasons I described. Processing of documents on the Application Server opens up a huge vector for compromise, not to mention how this may hurt UX. I think we'll just end up scaring away sources who won't be able to grasp the significance of all the metadata we display to them.

@psivesely psivesely removed this from the 0.4 milestone Dec 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants