Prevent automated submissions (CAPTCHA) #3

klpwired · 2013-05-14T23:11:00Z

Changed name due to debate over whether CAPTCHA is right method to use. Consensus is on developing a responsive approach that combines several methods, including CAPTCHAs.

stereoscott · 2013-05-15T17:25:50Z

A little note if I may: mobile submissions + autocorrect on a phone make CAPTCHAs quite frustrating... I realize you may not want to use a third party system, but something like areyouahuman that uses a game to prevent bot submissions is worth a quick look (and they have a python library)

bitsteak · 2013-05-15T19:01:18Z

Something like hashcash, a proof-of-work system, might be a better option.
http://www.hashcash.org/blog/

ioerror · 2013-05-16T09:53:30Z

Why do you need a CAPTCHA?

bitsteak · 2013-05-16T10:59:11Z

We don't want someone using a bot to script junk submissions.

ioerror · 2013-05-16T15:23:06Z

I suggest that rather than making is harder to upload, find a way to make it easier to sort out lots of irrelevant stuff on the back end. Every time a source has to jump through a hoop, they will probably reconsider their efforts and the site's efforts, I might add.

djon3s · 2013-05-16T15:45:26Z

@ioerror As I understand it, a captcha is one part of preventing a crippling attack given the current architecture. And that is the difference in ease between submitting it & and decrypting/sorting it. Even with a captcha this difference will be there but it will make the multiple something like 1 to 10 in terms of time, instead of 1 to 1000s. I do agree that addressing the sorting mechanisms for the journalists will also be important though.

ioerror · 2013-05-16T16:27:43Z

djon3s:

@ioerror A captcha is one way to prevent a crippling attack given the current architecture. And that is the difference in ease between submitting it & and sorting it.

I understand the general idea, though I think CAPTCHAs are generally
flawed. Jonathan Wilkin's research on the topic is a classic example of
someone automating an attack that would work against most deployed
CAPTCHA systems.

In my experience, while such an attack may occur, submitters that have a
hard time are universally impacted - if you ever have a submission
that is legitmate, any fence post security may completely discourage them.

If the system is run well, unless you're getting endlessly attacked, I'd
suggest you not set out to solve this problem by just throwing a CAPTCHA
at it. The human involved in the submission process may be adversely
affected enough to encourage them to not submit the document.

I'd rather find an intern to sort through data on the back end than have
some important data not uploaded.

dolanjs · 2013-05-16T16:57:17Z

The issue is not so much sorting out the backend but preventing/slowing down automated attacks. When not tracking ip or other identifable information about the source traffic, how do you do any type of throttling that doesn't just dos legit users.

ioerror · 2013-05-16T19:27:05Z

I understand the general problem. I do think that your proposed solution creates a social problem that is in conflict with the general goals - that is - people sending data that is wanted, when they may be unsure or quit when faced with any difficulty at all.

If anything, many uploads on the same connection may be given the same identifiers - so you can correlate uploads by connection, for example - rather than by IP. This doesn't do anything other than link submissions and so it isn't privacy violating and generally, only abusers will be affected.

In any case, I understand that there is a problem that may occur and that CAPTCHAs are the proposed solution. Some attackers do very well against CAPTCHAs, so I'll ask a counter question - what will you do when the CAPTCHA is broken? When that answer is obvious, we'll have a suitable answer that likely satisfies everyone.

Taipo · 2013-10-12T23:33:53Z

Part of the issue is the low availble bandwidth on a hidden services connection which makes it relatively easy to pin a webserver to the wall if it allows file uploads and large post data volume, using elementary methods of denial of service attacking. Just ask all those Anons who attack hidden service sites as a part of Op[Add your name here]. When they eventually give up on the GET/POST flood attacks, resources are freed up allowing the site accessibility to return because they haven't overwhelmed the server resources, just the data connection which in the case of TOR can be as low as 30kbps.

Yeah I completely agree about the problems with using captchas, however the continued unavailability of a secure drop can also lead to social problems. Maybe Zooko needs to be called on to ding his triangle?

boite · 2013-10-15T18:13:55Z

CAPTCHA is a very poor solution to this tricky problem. An adversary who is determined to automate submissions doesn't have to work very much harder to defeat CAPTCHAs. At best, CAPTCHA may prevent fully automatic submissions, but it certainly won't prevent semi-automatic submissions having a human in the loop to solve each CAPTCHA puzzle.

The reduction in usability and accessibility which results from CAPTCHA is not worth the very small additional protection it provides.

In any event, is the problem 'prevent automated submissions' or is it better stated as 'prevent resource exhaustion'?

fpietrosanti · 2013-10-15T21:51:18Z

At GlobaLeaks we are also going to integrate a CAPTCHA as a measure to make a little bit more difficult to implement an automated flood of submissions globaleaks/globaleaks-whistleblowing-software#189 .

That's not going to stop a knowledgable adversary, that require a multi-layer approach to prevent a submission flood, including additional round-trips and increasing delays, properly managed on UIs but creating inefficiency on the flooder.

fpietrosanti · 2013-11-04T17:52:58Z

At GlobaLeaks we just finished writing a proposal to implement submission flood resiliency, that maybe of help:
https://docs.google.com/document/d/1P-uHM5K3Hhe_KD6YvARbRTuqjVOVj0VkI7qPO9aWFQw/edit?usp=sharing

trevortimm · 2013-11-04T17:56:13Z

@fpietrosanti How many times in practice have media organizations running GlobaLeaks run into this problem? Is it still theoretical at this point?

fpietrosanti · 2013-11-04T18:02:57Z

@trevortimm It's already the 2nd time that happened, always "before" the release of a news or "after" a new came out. In the previous case there was tons of new files being uploaded on existing submission, in this case (now the flood is ongoing) there are many new submissions being created.

The end-user effect is that:

Receivers get tons of notifications
The Tip List on the UI is very long and difficult to deal with it

However, given the slowness of Tor Hidden Service, we never saw a resource exhaustion attack.

garrettr · 2013-11-04T20:17:05Z

Something like hashcash, a proof-of-work system, might be a better option.

@bitsteak We would need to require users to have Javascript enabled to do this, which is counter to our current strategy (#100, #101). Also, native coded bots would always have a speed/parallelism advantage, although we could probably do ok with asm.js. However, this is a cool idea, and may be something to consider in the future if we dispense with the strict no-JS stance.

garrettr · 2013-11-04T20:38:05Z

Every time a source has to jump through a hoop, they will probably reconsider their efforts and the site's efforts, I might add.

@ioerror Our sources already have to jump through the significant hoop of downloading and installing the Tor Browser Bundle to access our site. A per-submission CAPTCHA is far less inconvenient.

Availability to sources is an important goal, but so is availability to journalists. You are assuming that journalists have unlimited resources (time, interns) to throw at this problem, which is inefficient even if it were true. I also predict it will discourage journalists from using the site, and there is always the possibility that legitimate submissions will be lost in the DoS flood.

garrettr · 2013-11-04T20:39:09Z

We will need to use several tactics in tandem to defeat this problems (@fpietrosanti's outline above is a great overview). I propose an evolutionary approach.

We should start by implementing a time-delay based approach. We can timestamp the request in the auth cookie, and drop responses that takes less than X seconds. Since the submission form requires significant user interaction (using the file picker UI, or typing a message), we can assume that a response returned with a delay of less than several seconds is likely an automated submission and can be dropped (or we can be nice and return with a warning, e.g. "Are you a human?" and the option to submit again after a longer timeout). If the site is under attack, we can increase this timeout interval. It would be difficult to make a friendly UI for this without Javascript.

I think we should also implement CAPTCHAs. We should have them pref'd off by default, but set up so they can be quickly enabled if the site is under attack. Perhaps these countermeasure toggles could be exposed in the journalist's web interface, so they can take action as soon as they start to see unusual activity.

Finally, we need to improve the journalist's interface so they can more easily review, sort, and delete submissions and collections of submissions.

Taipo · 2013-11-05T03:03:03Z

It would be difficult to make a friendly UI for this without Javascript.

As @fpietrosanti pointed out, resource exhaustion is currently quite a difficult task against TOR hosted webservers. So not every request type needs to be tracked and 'counted'. POST requests would be sufficient enough to prevent mass submissions and dictionary attacks against codenames. Why would you need to use javascript?

garrettr · 2013-11-05T16:10:49Z

Why would you need to use javascript?

I was referring to a potential UI where, if a submission is made "too quickly", we return a page saying "You submitted that faster than we expected - are you a human?". Stack Overflow does this. To handle false positives, we should give the option to resubmit after a longer timeout, e.g. "you can try submitting again in 30 seconds". I'm saying that such an interface would be much more friendly if it had some kind of visible countdown on the page, which cannot be implemented without JS.

Taipo · 2013-11-05T22:45:16Z

@garrettr Apart from the friendliness of that sort of UI, which because the flood scenario should not be the experience of your average source who unless they were in a mad hurry to get files uploaded, would rarely break a flood rule, might be grounds to use javascript in that instance alone. Whatever the case, the cost in resources must be low.

But returning to the grounds for CAPTCHAs, first off I find them hideous and try to avoid services that use them where possible.

Using CAPTCHA would be the quick and easy method of defeating large scale DoS attacks where the attack is coming from many connections to many codenames ( Botnet attack ), but would not be affective alone in the long term against a well resourced attacker.

Sessions are another way of dealing with this as you and others pointed out, but creating a session hash based on the codename is not enough ( considering there is only one IP address 127.0.0.1 for all connections ). It would have to be on the actual connection itself as pointed out by @ioerror to limit a single attacker using multiple pre-registered codenames, and a single attacker making multiple POSTS/file uploads against a single codename.

So I propose an alternative multi-prong approach to this.

BOTNET attacks:

Connection rate per minute monitoring ( lets say 100 connections per minute regardless of how many sources are behind the connections )
Where rate limit is exceeded, globally trigger the CAPTCHA for anyone making submissions during a detected flood. This should slow or stop a BOTNET attack automatically.

Single Attacker Based Attack:

Connection based sessions to capture the flood from single connections. Connection hash( x ) is allowed to make y submissions per minute, trigger CAPTCHA on attacker who breaks these rules. Keeping in mind that a single attacker could pre-register a 1000 codenames and make one submission every .03 of a second toggling through each codename and still not break the rules of your average flood protection methods.

It is a work in progress, something I have been thinking about in the last week. I will give it some more thought and get back on any alterations to the concept.

garrettr · 2013-11-05T23:58:45Z

Nice writeup, @Taipo.

fpietrosanti · 2013-11-06T06:37:24Z

@Taipo @garrettr i do think that in any case we should "block" incoming connections but always throttle or delay them, enabling the whisleblower to make a submission anyway. That's because, as said by @Taipo, we don't have the source IP and who is making a DOS have a big advantage.

The first goal in handling a Flood is to avoid DOSing ourself, so there's no 100% resiliency action but only a set of many actions that can be done to make the flood "less annoying" for the journos and the system. But we need to do that without "blocking functionalities"

heartsucker · 2013-11-09T22:35:40Z

Working on this issue. Creating a Redis store the tracks counts of web events for past 30 minutes and looks for spikes in activity. Spikes will trigger an email for now and later will turn on captchas on the site.

Basic submission throttling for DoS prevention (freedomofpress#3). Only throttles msg/doc submission. Does not defend against replay attacks. Minimum submission interval is configurable in seconds.

liliakai · 2013-11-11T19:12:21Z

What does everyone think about putting a CAPTCHA on codename registration but not on document submission, then rate limiting submissions per codename? These features can both be configurable or triggered on global DOS detection. This may help mitigate the pre-registered codename attack mentioned by @Taipo .

Also if @ehartsuyker is adding redis for global detection then it can be used similarly for per-codename detection a la http://flask.pocoo.org/snippets/70/. (As opposed to using a client-side cookie which could be saved and reused by a clever attacker. Thanks to @Hainish for pointing that one out.)

heartsucker · 2013-11-12T21:16:04Z

I have moved away from using Redis and am keeping everything in memory as a dict.

garrettr · 2013-11-13T06:35:47Z

@liliakai Nice find on that throttling snippet! That could be very useful. I think if we were to use CAPTCHAs they should be on both registering codenames and submitting files - but I don't think we should have CAPTCHAs enabled for either of those pages by default.

Display the journalist's 2fa doc interface code

fix spelling typo

Tails v2 upgrade review continued

heartsucker · 2017-09-14T19:59:52Z

Marking as "Pending close" as this either requires as 3rd party (unacceptable) or adding a lib (something we try to avoid to keep the app lean in the name of security / auditability).

If we look at the current state of captchas using Google and CloudFlare as the prime examples, they a complex system of identifying images that are unlikely to be repeated for a given user. This is not something SD could support. The other "I'm not a robot" mouse-tracking-click-thing is also not an option because 1) JavaScript, and 2) requires all the fancy ML stuff to match it to known human patterns. Also not easy to support. That leaves us with some Python libs that general messy text for humans to decipher. Getting the complexity high enough that a computer can't beat it makes it too hard for humans too these days. This was realistic option for SD in 2013, but I think it's not anymore.

Last, if someone really wants to overload SD, Tor is likely to be the bottle neck. Someone could script the submission and still have a human solve the captcha and be able to overload the network and/or fill the disk in a day or two.

conorsch · 2017-09-14T20:14:36Z

Great summary, @heartsucker. I'll add that to date we have not received any support requests from organizations running SecureDrop in production to add a CAPTCHA. So far, all the discussion toward this feature has assumed that organizations demand it—but we haven't heard that yet.

@ioerror's point from years ago:

I suggest that rather than making is harder to upload, find a way to make it easier to sort out lots of irrelevant stuff on the back end.

Still rings true to my ear, and jives well with the workstation refresh we're looking at.

redshiftzero · 2017-09-14T20:49:16Z

I agree with @heartsucker in that it is challenging to devise a CAPTCHA that both:

uses no third parties and
wouldn't be trivial to defeat for an attacker capable of using e.g. Selenium to automate submissions to SecureDrop

The better solution is indeed to enable journalists to quickly sort through submissions (either manually via a smoother journalist workflow on the Qubes workstation - or in the future potentially by learning characteristics of spam submissions). Since a CAPTCHA isn't the right approach here I'm closing.

CircleCI's branch filtering does not work properly with pull requests from forks. The CIRCLE_BRANCH variable contains something like freedomofpress/pull/3 in this case. This means that docs and i18n PRs from forks are not being tested as we wish; the translation tests are not run for i18n PRs, and the app tests are being run for docs PRs. A CircleCI feature request [1] to improve this was closed without explanation, so I'm incorporating a workaround suggested in the CircleCI forums: using the GitHub API to obtain the real branch name for PRs from forks, and skipping tests if it doesn't match. Not all steps of the relevant jobs are skipped, but the most expensive ones are. Also, stop skipping static-analysis-and-no-known-cves, as it doesn't take that long, and might prevent problems from sneaking in on branches with inaccurate names. [1] https://discuss.circleci.com/t/only-build-pull-requests-targetting-specific-branch/6082/6

CircleCI's branch filtering does not work properly with pull requests from forks. The CIRCLE_BRANCH variable contains something like /pull/3 in this case. This means that docs and i18n PRs from forks are not being tested as we wish; the translation tests are not run for i18n PRs, and the app tests are being run for docs PRs. A CircleCI feature request [1] to improve this was closed without explanation, so I'm incorporating a workaround suggested in the CircleCI forums: using the GitHub API to obtain the real branch name for PRs from forks, and skipping tests if it doesn't match. Not all steps of the relevant jobs are skipped, but the most expensive ones are. Also, stop skipping static-analysis-and-no-known-cves, as it doesn't take that long, and might prevent problems from sneaking in on branches with inaccurate names. [1] https://discuss.circleci.com/t/only-build-pull-requests-targetting-specific-branch/6082/6

dolanjs mentioned this issue Nov 6, 2013

[Suggestion] Adding CAPTCHA to upload form #118

Closed

dolanjs added a commit that referenced this issue Jun 2, 2014

Merge pull request #3 from ageis/repo_packages

6ccd918

Display the journalist's 2fa doc interface code

garrettr pushed a commit that referenced this issue Oct 29, 2014

Merge pull request #3 from dolanjs/patch-1

899ddd7

fix spelling typo

harlo mentioned this issue Jun 24, 2015

Network Firewall doc needs correcting #1025

Closed

garrettr pushed a commit that referenced this issue May 19, 2016

Merge pull request #3 from conorsch/tails-v2-upgrade-review-continued

f406a1e

Tails v2 upgrade review continued

redshiftzero added help wanted! and removed hackathon labels Dec 7, 2016

ninavizz mentioned this issue Jan 28, 2017

Recommended 0.4 SOURCE UI Fixes #1536

Closed

redshiftzero removed the help wanted! label May 10, 2017

ageis mentioned this issue Jul 13, 2017

Automated web application configuration migration #1966

Open

redshiftzero added the help wanted Issues we would definitely appreciate volunteer help with label Aug 19, 2017

heartsucker added the stale label Sep 14, 2017

redshiftzero closed this as completed Sep 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent automated submissions (CAPTCHA) #3

Prevent automated submissions (CAPTCHA) #3

klpwired commented May 14, 2013

stereoscott commented May 15, 2013

bitsteak commented May 15, 2013

ioerror commented May 16, 2013

bitsteak commented May 16, 2013

ioerror commented May 16, 2013

djon3s commented May 16, 2013

ioerror commented May 16, 2013

dolanjs commented May 16, 2013

ioerror commented May 16, 2013

Taipo commented Oct 12, 2013

boite commented Oct 15, 2013

fpietrosanti commented Oct 15, 2013

fpietrosanti commented Nov 4, 2013

trevortimm commented Nov 4, 2013

fpietrosanti commented Nov 4, 2013

garrettr commented Nov 4, 2013

garrettr commented Nov 4, 2013

garrettr commented Nov 4, 2013

Taipo commented Nov 5, 2013

garrettr commented Nov 5, 2013

Taipo commented Nov 5, 2013

garrettr commented Nov 5, 2013

fpietrosanti commented Nov 6, 2013

heartsucker commented Nov 9, 2013

liliakai commented Nov 11, 2013

heartsucker commented Nov 12, 2013

garrettr commented Nov 13, 2013

heartsucker commented Sep 14, 2017

conorsch commented Sep 14, 2017

redshiftzero commented Sep 14, 2017

Prevent automated submissions (CAPTCHA) #3

Prevent automated submissions (CAPTCHA) #3

Comments

klpwired commented May 14, 2013

stereoscott commented May 15, 2013

bitsteak commented May 15, 2013

ioerror commented May 16, 2013

bitsteak commented May 16, 2013

ioerror commented May 16, 2013

djon3s commented May 16, 2013

ioerror commented May 16, 2013

dolanjs commented May 16, 2013

ioerror commented May 16, 2013

Taipo commented Oct 12, 2013

boite commented Oct 15, 2013

fpietrosanti commented Oct 15, 2013

fpietrosanti commented Nov 4, 2013

trevortimm commented Nov 4, 2013

fpietrosanti commented Nov 4, 2013

garrettr commented Nov 4, 2013

garrettr commented Nov 4, 2013

garrettr commented Nov 4, 2013

Taipo commented Nov 5, 2013

garrettr commented Nov 5, 2013

Taipo commented Nov 5, 2013

garrettr commented Nov 5, 2013

fpietrosanti commented Nov 6, 2013

heartsucker commented Nov 9, 2013

liliakai commented Nov 11, 2013

heartsucker commented Nov 12, 2013

garrettr commented Nov 13, 2013

heartsucker commented Sep 14, 2017

conorsch commented Sep 14, 2017

redshiftzero commented Sep 14, 2017