Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify regex for user/community name validation #5164

Merged
merged 13 commits into from
Nov 12, 2024
Merged

Conversation

SleeplessOne1917
Copy link
Member

// characters like `o` which looks identical in Latin and Cyrillic, and can be used to imitate
// other users. Checks for additional alphabets can be added in the same way.
static VALID_USERNAME_REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r"^(?:[a-zA-Z0-9_]{3,}|[0-9_\p{Arabic}]{3,}|[0-9_\p{Cyrillic}]{3,})$")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not using \w or \d because the regex crate adds non-ASCII meaning to them. Let me know if this is important to consider for an allowable name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex would be simpler by doing the length check separately with min_length_check().

|| VALID_ACTOR_NAME_REGEX_RU.is_match(name);
static VALID_COMMUNITY_NAME_REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"^(?:[0-9_a-z]{3,}|[0-9_[\p{Arabic}]&&\P{Lu}&&\P{Lt}]{3,}|[0-9_[\p{Cyrillic}&&\P{Lu}&&\P{Lt}]]{3,})$",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitting uppercase and titlecase characters.

Copy link
Member

@Nutomic Nutomic Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should uppercase be allowed for usernames but not for communities? It makes sense that both are consistent, and then we dont need separate regexes, functions and tests.

And same thing about the length check, I would also use .is_lowercase() in Rust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should uppercase be allowed for usernames but not for communities? It makes sense that both are consistent, and then we dont need separate regexes, functions and tests.

I was going based off of the comment change from the related UI PR. Before I made the fix in that PR, the only letters the frontend allowed were lowercase ASCII. I made it so that Arabic and Cyrillic would pass as well, but was still unclear on the lowercase character constraint.

I have no strong opinion on whether community actor names should or shouldn't allow uppercase letters. I just want to make sure that the username and community name validation don't differ between frontend and backend, and I assume the old frontend community name constraint was made with intent.

Copy link
Member

@Nutomic Nutomic Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the change for disallowing uppercase characters, but it was reverted a while later. For api tests we generate random usernames and community names including uppercase characters. So the backend works fine with uppercase, but maybe there would be problems in lemmy-ui. Anyway I cant find any reason why the rules should be different for username and community name, probably just an oversight.

}

fn is_valid_actor_name(name: &str, actor_name_max_length: usize, r: &Regex) -> LemmyResult<()> {
if name.len() <= actor_name_max_length && r.is_match(name) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The has_newline check ended up being superfluous as it was already covered by the regex.

.is_err());
}

#[test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for adding these tests. I know the regex should reject newlines now, but it'd probably be good to add tests for \n just in case.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, accidentally clicked approve.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just realized approving, then commenting on a review will not unapprove lol. My bad.

@Nutomic Nutomic changed the title Add lowercase only check for community names Simplify regex for user/community name validation Nov 6, 2024
@Nutomic Nutomic enabled auto-merge (squash) November 12, 2024 10:09
@Nutomic Nutomic merged commit a8fb55d into main Nov 12, 2024
2 checks passed
@SleeplessOne1917 SleeplessOne1917 deleted the cyrillic-arabic branch November 12, 2024 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants