Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent automated submissions (CAPTCHA) #3

Closed
klpwired opened this issue May 14, 2013 · 30 comments
Closed

Prevent automated submissions (CAPTCHA) #3

klpwired opened this issue May 14, 2013 · 30 comments
Labels
help wanted Issues we would definitely appreciate volunteer help with stale

Comments

@klpwired
Copy link
Contributor

Changed name due to debate over whether CAPTCHA is right method to use. Consensus is on developing a responsive approach that combines several methods, including CAPTCHAs.

@stereoscott
Copy link

A little note if I may: mobile submissions + autocorrect on a phone make CAPTCHAs quite frustrating... I realize you may not want to use a third party system, but something like areyouahuman that uses a game to prevent bot submissions is worth a quick look (and they have a python library)

@bitsteak
Copy link
Contributor

Something like hashcash, a proof-of-work system, might be a better option.
http://www.hashcash.org/blog/

@ioerror
Copy link

ioerror commented May 16, 2013

Why do you need a CAPTCHA?

@bitsteak
Copy link
Contributor

We don't want someone using a bot to script junk submissions.

@ioerror
Copy link

ioerror commented May 16, 2013

I suggest that rather than making is harder to upload, find a way to make it easier to sort out lots of irrelevant stuff on the back end. Every time a source has to jump through a hoop, they will probably reconsider their efforts and the site's efforts, I might add.

@djon3s
Copy link

djon3s commented May 16, 2013

@ioerror As I understand it, a captcha is one part of preventing a crippling attack given the current architecture. And that is the difference in ease between submitting it & and decrypting/sorting it. Even with a captcha this difference will be there but it will make the multiple something like 1 to 10 in terms of time, instead of 1 to 1000s. I do agree that addressing the sorting mechanisms for the journalists will also be important though.

@ioerror
Copy link

ioerror commented May 16, 2013

djon3s:

@ioerror A captcha is one way to prevent a crippling attack given the current architecture. And that is the difference in ease between submitting it & and sorting it.

I understand the general idea, though I think CAPTCHAs are generally
flawed. Jonathan Wilkin's research on the topic is a classic example of
someone automating an attack that would work against most deployed
CAPTCHA systems.

In my experience, while such an attack may occur, submitters that have a
hard time are universally impacted - if you ever have a submission
that is legitmate, any fence post security may completely discourage them.

If the system is run well, unless you're getting endlessly attacked, I'd
suggest you not set out to solve this problem by just throwing a CAPTCHA
at it. The human involved in the submission process may be adversely
affected enough to encourage them to not submit the document.

I'd rather find an intern to sort through data on the back end than have
some important data not uploaded.

@dolanjs
Copy link
Contributor

dolanjs commented May 16, 2013

The issue is not so much sorting out the backend but preventing/slowing down automated attacks. When not tracking ip or other identifable information about the source traffic, how do you do any type of throttling that doesn't just dos legit users.

@ioerror
Copy link

ioerror commented May 16, 2013

I understand the general problem. I do think that your proposed solution creates a social problem that is in conflict with the general goals - that is - people sending data that is wanted, when they may be unsure or quit when faced with any difficulty at all.

If anything, many uploads on the same connection may be given the same identifiers - so you can correlate uploads by connection, for example - rather than by IP. This doesn't do anything other than link submissions and so it isn't privacy violating and generally, only abusers will be affected.

In any case, I understand that there is a problem that may occur and that CAPTCHAs are the proposed solution. Some attackers do very well against CAPTCHAs, so I'll ask a counter question - what will you do when the CAPTCHA is broken? When that answer is obvious, we'll have a suitable answer that likely satisfies everyone.

@Taipo
Copy link

Taipo commented Oct 12, 2013

Part of the issue is the low availble bandwidth on a hidden services connection which makes it relatively easy to pin a webserver to the wall if it allows file uploads and large post data volume, using elementary methods of denial of service attacking. Just ask all those Anons who attack hidden service sites as a part of Op[Add your name here]. When they eventually give up on the GET/POST flood attacks, resources are freed up allowing the site accessibility to return because they haven't overwhelmed the server resources, just the data connection which in the case of TOR can be as low as 30kbps.

Yeah I completely agree about the problems with using captchas, however the continued unavailability of a secure drop can also lead to social problems. Maybe Zooko needs to be called on to ding his triangle?

@boite
Copy link

boite commented Oct 15, 2013

CAPTCHA is a very poor solution to this tricky problem. An adversary who is determined to automate submissions doesn't have to work very much harder to defeat CAPTCHAs. At best, CAPTCHA may prevent fully automatic submissions, but it certainly won't prevent semi-automatic submissions having a human in the loop to solve each CAPTCHA puzzle.

The reduction in usability and accessibility which results from CAPTCHA is not worth the very small additional protection it provides.

In any event, is the problem 'prevent automated submissions' or is it better stated as 'prevent resource exhaustion'?

@fpietrosanti
Copy link

At GlobaLeaks we are also going to integrate a CAPTCHA as a measure to make a little bit more difficult to implement an automated flood of submissions globaleaks/globaleaks-whistleblowing-software#189 .

That's not going to stop a knowledgable adversary, that require a multi-layer approach to prevent a submission flood, including additional round-trips and increasing delays, properly managed on UIs but creating inefficiency on the flooder.

@fpietrosanti
Copy link

At GlobaLeaks we just finished writing a proposal to implement submission flood resiliency, that maybe of help:
https://docs.google.com/document/d/1P-uHM5K3Hhe_KD6YvARbRTuqjVOVj0VkI7qPO9aWFQw/edit?usp=sharing

@trevortimm
Copy link
Contributor

@fpietrosanti How many times in practice have media organizations running GlobaLeaks run into this problem? Is it still theoretical at this point?

@fpietrosanti
Copy link

@trevortimm It's already the 2nd time that happened, always "before" the release of a news or "after" a new came out. In the previous case there was tons of new files being uploaded on existing submission, in this case (now the flood is ongoing) there are many new submissions being created.

The end-user effect is that:

  • Receivers get tons of notifications
  • The Tip List on the UI is very long and difficult to deal with it

However, given the slowness of Tor Hidden Service, we never saw a resource exhaustion attack.

@garrettr
Copy link
Contributor

garrettr commented Nov 4, 2013

Something like hashcash, a proof-of-work system, might be a better option.

@bitsteak We would need to require users to have Javascript enabled to do this, which is counter to our current strategy (#100, #101). Also, native coded bots would always have a speed/parallelism advantage, although we could probably do ok with asm.js. However, this is a cool idea, and may be something to consider in the future if we dispense with the strict no-JS stance.

@garrettr
Copy link
Contributor

garrettr commented Nov 4, 2013

Every time a source has to jump through a hoop, they will probably reconsider their efforts and the site's efforts, I might add.

@ioerror Our sources already have to jump through the significant hoop of downloading and installing the Tor Browser Bundle to access our site. A per-submission CAPTCHA is far less inconvenient.

Availability to sources is an important goal, but so is availability to journalists. You are assuming that journalists have unlimited resources (time, interns) to throw at this problem, which is inefficient even if it were true. I also predict it will discourage journalists from using the site, and there is always the possibility that legitimate submissions will be lost in the DoS flood.

@garrettr
Copy link
Contributor

garrettr commented Nov 4, 2013

We will need to use several tactics in tandem to defeat this problems (@fpietrosanti's outline above is a great overview). I propose an evolutionary approach.

We should start by implementing a time-delay based approach. We can timestamp the request in the auth cookie, and drop responses that takes less than X seconds. Since the submission form requires significant user interaction (using the file picker UI, or typing a message), we can assume that a response returned with a delay of less than several seconds is likely an automated submission and can be dropped (or we can be nice and return with a warning, e.g. "Are you a human?" and the option to submit again after a longer timeout). If the site is under attack, we can increase this timeout interval. It would be difficult to make a friendly UI for this without Javascript.

I think we should also implement CAPTCHAs. We should have them pref'd off by default, but set up so they can be quickly enabled if the site is under attack. Perhaps these countermeasure toggles could be exposed in the journalist's web interface, so they can take action as soon as they start to see unusual activity.

Finally, we need to improve the journalist's interface so they can more easily review, sort, and delete submissions and collections of submissions.

@Taipo
Copy link

Taipo commented Nov 5, 2013

It would be difficult to make a friendly UI for this without Javascript.

As @fpietrosanti pointed out, resource exhaustion is currently quite a difficult task against TOR hosted webservers. So not every request type needs to be tracked and 'counted'. POST requests would be sufficient enough to prevent mass submissions and dictionary attacks against codenames. Why would you need to use javascript?

@garrettr
Copy link
Contributor

garrettr commented Nov 5, 2013

Why would you need to use javascript?

I was referring to a potential UI where, if a submission is made "too quickly", we return a page saying "You submitted that faster than we expected - are you a human?". Stack Overflow does this. To handle false positives, we should give the option to resubmit after a longer timeout, e.g. "you can try submitting again in 30 seconds". I'm saying that such an interface would be much more friendly if it had some kind of visible countdown on the page, which cannot be implemented without JS.

@Taipo
Copy link

Taipo commented Nov 5, 2013

@garrettr Apart from the friendliness of that sort of UI, which because the flood scenario should not be the experience of your average source who unless they were in a mad hurry to get files uploaded, would rarely break a flood rule, might be grounds to use javascript in that instance alone. Whatever the case, the cost in resources must be low.

But returning to the grounds for CAPTCHAs, first off I find them hideous and try to avoid services that use them where possible.

Using CAPTCHA would be the quick and easy method of defeating large scale DoS attacks where the attack is coming from many connections to many codenames ( Botnet attack ), but would not be affective alone in the long term against a well resourced attacker.

Sessions are another way of dealing with this as you and others pointed out, but creating a session hash based on the codename is not enough ( considering there is only one IP address 127.0.0.1 for all connections ). It would have to be on the actual connection itself as pointed out by @ioerror to limit a single attacker using multiple pre-registered codenames, and a single attacker making multiple POSTS/file uploads against a single codename.

So I propose an alternative multi-prong approach to this.

BOTNET attacks:

  • Connection rate per minute monitoring ( lets say 100 connections per minute regardless of how many sources are behind the connections )
  • Where rate limit is exceeded, globally trigger the CAPTCHA for anyone making submissions during a detected flood. This should slow or stop a BOTNET attack automatically.

Single Attacker Based Attack:

  • Connection based sessions to capture the flood from single connections. Connection hash( x ) is allowed to make y submissions per minute, trigger CAPTCHA on attacker who breaks these rules. Keeping in mind that a single attacker could pre-register a 1000 codenames and make one submission every .03 of a second toggling through each codename and still not break the rules of your average flood protection methods.

It is a work in progress, something I have been thinking about in the last week. I will give it some more thought and get back on any alterations to the concept.

@garrettr
Copy link
Contributor

garrettr commented Nov 5, 2013

Nice writeup, @Taipo.

@fpietrosanti
Copy link

@Taipo @garrettr i do think that in any case we should "block" incoming connections but always throttle or delay them, enabling the whisleblower to make a submission anyway. That's because, as said by @Taipo, we don't have the source IP and who is making a DOS have a big advantage.

The first goal in handling a Flood is to avoid DOSing ourself, so there's no 100% resiliency action but only a set of many actions that can be done to make the flood "less annoying" for the journos and the system. But we need to do that without "blocking functionalities"

@heartsucker
Copy link
Contributor

Working on this issue. Creating a Redis store the tracks counts of web events for past 30 minutes and looks for spikes in activity. Spikes will trigger an email for now and later will turn on captchas on the site.

liliakai added a commit to liliakai/securedrop that referenced this issue Nov 11, 2013
Basic submission throttling for DoS prevention (freedomofpress#3). Only throttles
msg/doc submission. Does not defend against replay attacks. Minimum
submission interval is configurable in seconds.
@liliakai
Copy link
Contributor

What does everyone think about putting a CAPTCHA on codename registration but not on document submission, then rate limiting submissions per codename? These features can both be configurable or triggered on global DOS detection. This may help mitigate the pre-registered codename attack mentioned by @Taipo .

Also if @ehartsuyker is adding redis for global detection then it can be used similarly for per-codename detection a la http://flask.pocoo.org/snippets/70/. (As opposed to using a client-side cookie which could be saved and reused by a clever attacker. Thanks to @Hainish for pointing that one out.)

@heartsucker
Copy link
Contributor

I have moved away from using Redis and am keeping everything in memory as a dict.

@garrettr
Copy link
Contributor

@liliakai Nice find on that throttling snippet! That could be very useful. I think if we were to use CAPTCHAs they should be on both registering codenames and submitting files - but I don't think we should have CAPTCHAs enabled for either of those pages by default.

dolanjs added a commit that referenced this issue Jun 2, 2014
Display the journalist's 2fa doc interface code
garrettr pushed a commit that referenced this issue Oct 29, 2014
garrettr pushed a commit that referenced this issue May 19, 2016
@redshiftzero redshiftzero added the help wanted Issues we would definitely appreciate volunteer help with label Aug 19, 2017
@heartsucker
Copy link
Contributor

Marking as "Pending close" as this either requires as 3rd party (unacceptable) or adding a lib (something we try to avoid to keep the app lean in the name of security / auditability).

If we look at the current state of captchas using Google and CloudFlare as the prime examples, they a complex system of identifying images that are unlikely to be repeated for a given user. This is not something SD could support. The other "I'm not a robot" mouse-tracking-click-thing is also not an option because 1) JavaScript, and 2) requires all the fancy ML stuff to match it to known human patterns. Also not easy to support. That leaves us with some Python libs that general messy text for humans to decipher. Getting the complexity high enough that a computer can't beat it makes it too hard for humans too these days. This was realistic option for SD in 2013, but I think it's not anymore.

Last, if someone really wants to overload SD, Tor is likely to be the bottle neck. Someone could script the submission and still have a human solve the captcha and be able to overload the network and/or fill the disk in a day or two.

@conorsch
Copy link
Contributor

Great summary, @heartsucker. I'll add that to date we have not received any support requests from organizations running SecureDrop in production to add a CAPTCHA. So far, all the discussion toward this feature has assumed that organizations demand it—but we haven't heard that yet.

@ioerror's point from years ago:

I suggest that rather than making is harder to upload, find a way to make it easier to sort out lots of irrelevant stuff on the back end.

Still rings true to my ear, and jives well with the workstation refresh we're looking at.

@redshiftzero
Copy link
Contributor

I agree with @heartsucker in that it is challenging to devise a CAPTCHA that both:

  • uses no third parties and
  • wouldn't be trivial to defeat for an attacker capable of using e.g. Selenium to automate submissions to SecureDrop

The better solution is indeed to enable journalists to quickly sort through submissions (either manually via a smoother journalist workflow on the Qubes workstation - or in the future potentially by learning characteristics of spam submissions). Since a CAPTCHA isn't the right approach here I'm closing.

rmol added a commit to rmol/securedrop that referenced this issue Jun 4, 2019
CircleCI's branch filtering does not work properly with pull requests
from forks. The CIRCLE_BRANCH variable contains something like freedomofpress/pull/3
in this case. This means that docs and i18n PRs from forks are not
being tested as we wish; the translation tests are not run for i18n
PRs, and the app tests are being run for docs PRs.

A CircleCI feature request [1] to improve this was closed without
explanation, so I'm incorporating a workaround suggested in the
CircleCI forums: using the GitHub API to obtain the real branch name
for PRs from forks, and skipping tests if it doesn't match. Not all
steps of the relevant jobs are skipped, but the most expensive ones
are.

Also, stop skipping static-analysis-and-no-known-cves, as it doesn't
take that long, and might prevent problems from sneaking in on
branches with inaccurate names.

[1] https://discuss.circleci.com/t/only-build-pull-requests-targetting-specific-branch/6082/6
rmol added a commit to rmol/securedrop that referenced this issue Jun 5, 2019
CircleCI's branch filtering does not work properly with pull requests
from forks. The CIRCLE_BRANCH variable contains something like freedomofpress/pull/3
in this case. This means that docs and i18n PRs from forks are not
being tested as we wish; the translation tests are not run for i18n
PRs, and the app tests are being run for docs PRs.

A CircleCI feature request [1] to improve this was closed without
explanation, so I'm incorporating a workaround suggested in the
CircleCI forums: using the GitHub API to obtain the real branch name
for PRs from forks, and skipping tests if it doesn't match. Not all
steps of the relevant jobs are skipped, but the most expensive ones
are.

Also, stop skipping static-analysis-and-no-known-cves, as it doesn't
take that long, and might prevent problems from sneaking in on
branches with inaccurate names.

[1] https://discuss.circleci.com/t/only-build-pull-requests-targetting-specific-branch/6082/6
rmol added a commit to rmol/securedrop that referenced this issue Jun 5, 2019
CircleCI's branch filtering does not work properly with pull requests
from forks. The CIRCLE_BRANCH variable contains something like freedomofpress/pull/3
in this case. This means that docs and i18n PRs from forks are not
being tested as we wish; the translation tests are not run for i18n
PRs, and the app tests are being run for docs PRs.

A CircleCI feature request [1] to improve this was closed without
explanation, so I'm incorporating a workaround suggested in the
CircleCI forums: using the GitHub API to obtain the real branch name
for PRs from forks, and skipping tests if it doesn't match. Not all
steps of the relevant jobs are skipped, but the most expensive ones
are.

Also, stop skipping static-analysis-and-no-known-cves, as it doesn't
take that long, and might prevent problems from sneaking in on
branches with inaccurate names.

[1] https://discuss.circleci.com/t/only-build-pull-requests-targetting-specific-branch/6082/6
rmol added a commit to rmol/securedrop that referenced this issue Jun 5, 2019
CircleCI's branch filtering does not work properly with pull requests
from forks. The CIRCLE_BRANCH variable contains something like freedomofpress/pull/3
in this case. This means that docs and i18n PRs from forks are not
being tested as we wish; the translation tests are not run for i18n
PRs, and the app tests are being run for docs PRs.

A CircleCI feature request [1] to improve this was closed without
explanation, so I'm incorporating a workaround suggested in the
CircleCI forums: using the GitHub API to obtain the real branch name
for PRs from forks, and skipping tests if it doesn't match. Not all
steps of the relevant jobs are skipped, but the most expensive ones
are.

Also, stop skipping static-analysis-and-no-known-cves, as it doesn't
take that long, and might prevent problems from sneaking in on
branches with inaccurate names.

[1] https://discuss.circleci.com/t/only-build-pull-requests-targetting-specific-branch/6082/6
rmol added a commit to rmol/securedrop that referenced this issue Jun 7, 2019
CircleCI's branch filtering does not work properly with pull requests
from forks. The CIRCLE_BRANCH variable contains something like freedomofpress/pull/3
in this case. This means that docs and i18n PRs from forks are not
being tested as we wish; the translation tests are not run for i18n
PRs, and the app tests are being run for docs PRs.

A CircleCI feature request [1] to improve this was closed without
explanation, so I'm incorporating a workaround suggested in the
CircleCI forums: using the GitHub API to obtain the real branch name
for PRs from forks, and skipping tests if it doesn't match. Not all
steps of the relevant jobs are skipped, but the most expensive ones
are.

Also, stop skipping static-analysis-and-no-known-cves, as it doesn't
take that long, and might prevent problems from sneaking in on
branches with inaccurate names.

[1] https://discuss.circleci.com/t/only-build-pull-requests-targetting-specific-branch/6082/6
kushaldas pushed a commit that referenced this issue Sep 25, 2019
CircleCI's branch filtering does not work properly with pull requests
from forks. The CIRCLE_BRANCH variable contains something like /pull/3
in this case. This means that docs and i18n PRs from forks are not
being tested as we wish; the translation tests are not run for i18n
PRs, and the app tests are being run for docs PRs.

A CircleCI feature request [1] to improve this was closed without
explanation, so I'm incorporating a workaround suggested in the
CircleCI forums: using the GitHub API to obtain the real branch name
for PRs from forks, and skipping tests if it doesn't match. Not all
steps of the relevant jobs are skipped, but the most expensive ones
are.

Also, stop skipping static-analysis-and-no-known-cves, as it doesn't
take that long, and might prevent problems from sneaking in on
branches with inaccurate names.

[1] https://discuss.circleci.com/t/only-build-pull-requests-targetting-specific-branch/6082/6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues we would definitely appreciate volunteer help with stale
Projects
None yet
Development

No branches or pull requests