Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faker is unsafe to use in automated tests - real urls and email addresses are generated #2431

Closed
patrick-beep opened this issue Dec 20, 2021 · 19 comments · Fixed by #2733
Closed

Comments

@patrick-beep
Copy link

patrick-beep commented Dec 20, 2021

Describe the bug

Faker::Internet.url, and Faker::Internet.email generate potentially real domain names and email addresses. Using potentially real urls and email addresses in automated testing environments is a security vulnerability. Faker is extensively used in automated test environments to generate test data.

To Reproduce

Call Faker::Internet.url and see that the result is a url that could resolve to a real domain name, with a tld of .com.

Expected behavior

A TLD of .test or .example should be used instead. See RFC2606

Additional context

I understand that some users would like to see .com urls in their Faker output. I'd like to suggest that this should be easy to do by opting in to this behaviour instead of it being the default. Perhaps a parameter for specifying the tld when generating an email address or url? The default behaviour however should be safe.

@KarlHeitmann
Copy link
Contributor

I can tackle this issue.

@saiqulhaq
Copy link

@patrick-beep please check #2506

Faker::Internet.email # => "[email protected]"

Faker::Config.internet_safe_mode = true
Faker::Internet.email # => "[email protected]"
Faker::Internet.url # => "http://marks.example/vern"
Faker::Internet.domain_name # => "brown.example"

@stefannibrasil
Copy link
Contributor

We currently have Faker::Internet.safe_email:

Faker::Internet.safe_email #=> "[email protected]"

I assume going with this approach of configuring internet_safe_mode could include adding a deprecation to Faker::Internet.safe_email. Otherwise, I think having both would lead to confusion.

Thoughts?

@Zeragamba
Copy link
Contributor

I concur that #safe_email could then be deprecated after internet_safe mode is enabled

@fbuys
Copy link
Contributor

fbuys commented Sep 28, 2022

I am wondering why we need to add config - if we already have a safe_email method?
And can we not also add a safe_url method.

From my perspective it seems like a smaller change, also more inline with the existing convention perhaps.

@Zeragamba
Copy link
Contributor

@fbuys to be better internet citizens, we want to update Faker to create safe emails by default. Especially since Faker is often used in automated test environments that may accidently spam a real person or system.

@fbuys
Copy link
Contributor

fbuys commented Sep 29, 2022

Oh I got it @Zeragamba, what if we did it the other way around though. Instead we remove safe_email and return safe email addresses with email and then real_world_email would generate "unsafe" email addresses.

@Zeragamba
Copy link
Contributor

I'd still prefer to leave it as #email so that we can keep the api surface clean. plus we can also allow the user to override the safe mode if needed by allowing them to specify a domain to use instead of example or perhaps allow internet_safe: false as needed

@fbuys
Copy link
Contributor

fbuys commented Sep 30, 2022

Oh yes, I like the idea of passing an override in. So I'd vote for: internet_safe: false to turn safe emails off.

@hlascelles
Copy link

We would like this but/and...

We use a "dud domain" checker that would break on these emails. ie, our app prevents users from using a [email protected] or even a [email protected] address.

We do have an internal domain we do allow that is "real", but won't "spam anyone".

Can this feature have a "set what qualifies as a safe domain" config?

eg:

internet_safe: false
internet_safe_domains: ["myinternaldomain.com"]

@saiqulhaq-hh
Copy link

saiqulhaq-hh commented Oct 15, 2022

These are the changes that will happen when my PR is merged
image(https://docs.google.com/spreadsheets/d/1Sec5OcigkhJnlyYd8GoE8kCbs4OyoFOR1oI9fP9c6zY/edit?usp=sharing)

it introduces a few configs too

internet_safe_mode= true / false
internet_safe_mode?
internet_safe_domains= ["zzz", "xxx", "ccc"]
internet_safe_domains

what do you think?
#2506

  • edit, sorry this is a my another GitHub account, same as @saiqulhaq

@stefannibrasil
Copy link
Contributor

I want to revisit this issue at its core before jumping to implementation ideas.

The real issue here is Faker generating real URL domain names and email addresses.

RFC2606 recommends using Reserved Top Level DNS Names as the best current practices for the internet community:

"To reduce the likelihood of conflict and confusion, a few top level domain names are reserved for use in private testing, as examples in documentation, and the like. Depending on the nature of the test or example, it might be best for it to be referencing a TLD permanently reserved for such purposes.

To safely satisfy these needs, four domain names are reserved as listed and described below.

 .test
.example
.invalid
.localhost

".test" is recommended for use in testing of current or new DNS related code.

".example" is recommended for use in documentation or as examples."

See RFC2606 for more details.

So the top-level domain recommended is .test for the generators and .example for documentation.

The option of disabling a "safe" configuration feels odd to me. We are either safe or not. We choose to follow the security recommendations.

And although Faker::Internet.safe_email says it generates an RFC 2606 compliant fake email, it is not entirely true because example. is an actual address that is being used and could be delivered. .test is the only reserved TLD for testing.

I realize this is not a popular choice, and the decision is not only mine to make, but I'd vote for only using .test as the TLD for all URLs and email addresses on Faker. In the docs, we guide the users not to use Faker URLs or email addresses since we would breach the security recommendations.

If users need to test with real users, I recommend using Faker to generate the fake usernames to append to their domain of choice.

We will leave that security choice to them, but as a library many people use, we don't want to ignore the security best practices.

We have "fake" in the name, after all. We shouldn't be generating real URLs and email addresses.

@saiqulhaq-hh
Copy link

yea agreed
if we do that, then we should release a new major version for breaking change indication to faker's users

@fbuys
Copy link
Contributor

fbuys commented Nov 7, 2022

@stefannibrasil thanks for the clear explanation. I agree with what you said. And I think the solution will make faker easier to maintain too.

@stefannibrasil
Copy link
Contributor

stefannibrasil commented Nov 8, 2022

We just released v3 so maybe this one could be released in the beginning of next year 🤔

We also need to add deprecation warnings to use in preparation for the breaking change.

I'd like to hear more from the other maintainers too, @vbrazo @thdaraujo @Zeragamba

@stefannibrasil
Copy link
Contributor

I think the next steps could be adding a deprecation warning in preparation for the release of Faker 4.0.

@stefannibrasil
Copy link
Contributor

Quick update: I'm working on adding the deprecation messages to go out on the next release.

@stefannibrasil
Copy link
Contributor

I've opened up the first PR here, constructive feedback is welcome:

#2709

After it's merged, the Changelog will announce why we are moving in this direction, and hopefully it will be a smooth transition for users.

Once we give enough time for users to switch to use the safe_email and safe_url only, faker 4 will be released with this breaking change.

@stefannibrasil
Copy link
Contributor

We came up with a new approach here: #2733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment