-
-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify regex for user/community name validation #5164
Conversation
crates/utils/src/utils/validation.rs
Outdated
// characters like `o` which looks identical in Latin and Cyrillic, and can be used to imitate | ||
// other users. Checks for additional alphabets can be added in the same way. | ||
static VALID_USERNAME_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
Regex::new(r"^(?:[a-zA-Z0-9_]{3,}|[0-9_\p{Arabic}]{3,}|[0-9_\p{Cyrillic}]{3,})$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently not using \w
or \d
because the regex crate adds non-ASCII meaning to them. Let me know if this is important to consider for an allowable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex would be simpler by doing the length check separately with min_length_check()
.
crates/utils/src/utils/validation.rs
Outdated
|| VALID_ACTOR_NAME_REGEX_RU.is_match(name); | ||
static VALID_COMMUNITY_NAME_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
Regex::new( | ||
r"^(?:[0-9_a-z]{3,}|[0-9_[\p{Arabic}]&&\P{Lu}&&\P{Lt}]{3,}|[0-9_[\p{Cyrillic}&&\P{Lu}&&\P{Lt}]]{3,})$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omitting uppercase and titlecase characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should uppercase be allowed for usernames but not for communities? It makes sense that both are consistent, and then we dont need separate regexes, functions and tests.
And same thing about the length check, I would also use .is_lowercase()
in Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should uppercase be allowed for usernames but not for communities? It makes sense that both are consistent, and then we dont need separate regexes, functions and tests.
I was going based off of the comment change from the related UI PR. Before I made the fix in that PR, the only letters the frontend allowed were lowercase ASCII. I made it so that Arabic and Cyrillic would pass as well, but was still unclear on the lowercase character constraint.
I have no strong opinion on whether community actor names should or shouldn't allow uppercase letters. I just want to make sure that the username and community name validation don't differ between frontend and backend, and I assume the old frontend community name constraint was made with intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the change for disallowing uppercase characters, but it was reverted a while later. For api tests we generate random usernames and community names including uppercase characters. So the backend works fine with uppercase, but maybe there would be problems in lemmy-ui. Anyway I cant find any reason why the rules should be different for username and community name, probably just an oversight.
crates/utils/src/utils/validation.rs
Outdated
} | ||
|
||
fn is_valid_actor_name(name: &str, actor_name_max_length: usize, r: &Regex) -> LemmyResult<()> { | ||
if name.len() <= actor_name_max_length && r.is_match(name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The has_newline
check ended up being superfluous as it was already covered by the regex.
crates/utils/src/utils/validation.rs
Outdated
.is_err()); | ||
} | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for adding these tests. I know the regex should reject newlines now, but it'd probably be good to add tests for \n
just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, accidentally clicked approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, just realized approving, then commenting on a review will not unapprove lol. My bad.
This was discussed in a UI PR that allowed Arabic and Cyrillic to be used for actor names.