-
-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify regex for user/community name validation #5164
Changes from 5 commits
eef51e3
1d08651
982b331
34ee03f
51e9586
dd92f37
6c70cec
e2cfb1f
67c74a0
28b7f08
3d65dd3
8d0a4b8
686ae38
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,27 +84,37 @@ fn has_newline(name: &str) -> bool { | |
name.contains('\n') | ||
} | ||
|
||
pub fn is_valid_actor_name(name: &str, actor_name_max_length: usize) -> LemmyResult<()> { | ||
static VALID_ACTOR_NAME_REGEX_EN: LazyLock<Regex> = | ||
LazyLock::new(|| Regex::new(r"^[a-zA-Z0-9_]{3,}$").expect("compile regex")); | ||
static VALID_ACTOR_NAME_REGEX_AR: LazyLock<Regex> = | ||
LazyLock::new(|| Regex::new(r"^[\p{Arabic}0-9_]{3,}$").expect("compile regex")); | ||
static VALID_ACTOR_NAME_REGEX_RU: LazyLock<Regex> = | ||
LazyLock::new(|| Regex::new(r"^[\p{Cyrillic}0-9_]{3,}$").expect("compile regex")); | ||
pub fn is_valid_username(name: &str, actor_name_max_length: usize) -> LemmyResult<()> { | ||
// Only allow characters from a single alphabet per username. This avoids problems with lookalike | ||
// characters like `o` which looks identical in Latin and Cyrillic, and can be used to imitate | ||
// other users. Checks for additional alphabets can be added in the same way. | ||
static VALID_USERNAME_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
Regex::new(r"^(?:[a-zA-Z0-9_]{3,}|[0-9_\p{Arabic}]{3,}|[0-9_\p{Cyrillic}]{3,})$") | ||
.expect("compile regex") | ||
}); | ||
|
||
let check = name.chars().count() <= actor_name_max_length && !has_newline(name); | ||
is_valid_actor_name(name, actor_name_max_length, &VALID_USERNAME_REGEX) | ||
} | ||
|
||
pub fn is_valid_community_name(name: &str, actor_name_max_length: usize) -> LemmyResult<()> { | ||
// Only allow characters from a single alphabet per username. This avoids problems with lookalike | ||
// characters like `o` which looks identical in Latin and Cyrillic, and can be used to imitate | ||
// other users. Checks for additional alphabets can be added in the same way. | ||
let lang_check = VALID_ACTOR_NAME_REGEX_EN.is_match(name) | ||
|| VALID_ACTOR_NAME_REGEX_AR.is_match(name) | ||
|| VALID_ACTOR_NAME_REGEX_RU.is_match(name); | ||
static VALID_COMMUNITY_NAME_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
Regex::new( | ||
r"^(?:[0-9_a-z]{3,}|[0-9_[\p{Arabic}]&&\P{Lu}&&\P{Lt}]{3,}|[0-9_[\p{Cyrillic}&&\P{Lu}&&\P{Lt}]]{3,})$", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Omitting uppercase and titlecase characters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why should uppercase be allowed for usernames but not for communities? It makes sense that both are consistent, and then we dont need separate regexes, functions and tests. And same thing about the length check, I would also use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I was going based off of the comment change from the related UI PR. Before I made the fix in that PR, the only letters the frontend allowed were lowercase ASCII. I made it so that Arabic and Cyrillic would pass as well, but was still unclear on the lowercase character constraint. I have no strong opinion on whether community actor names should or shouldn't allow uppercase letters. I just want to make sure that the username and community name validation don't differ between frontend and backend, and I assume the old frontend community name constraint was made with intent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is the change for disallowing uppercase characters, but it was reverted a while later. For api tests we generate random usernames and community names including uppercase characters. So the backend works fine with uppercase, but maybe there would be problems in lemmy-ui. Anyway I cant find any reason why the rules should be different for username and community name, probably just an oversight. |
||
) | ||
.expect("compile regex") | ||
}); | ||
|
||
if !check || !lang_check { | ||
Err(LemmyErrorType::InvalidName.into()) | ||
} else { | ||
is_valid_actor_name(name, actor_name_max_length, &VALID_COMMUNITY_NAME_REGEX) | ||
} | ||
|
||
fn is_valid_actor_name(name: &str, actor_name_max_length: usize, r: &Regex) -> LemmyResult<()> { | ||
if name.len() <= actor_name_max_length && r.is_match(name) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
Ok(()) | ||
} else { | ||
Err(LemmyErrorType::InvalidName.into()) | ||
} | ||
} | ||
|
||
|
@@ -362,12 +372,13 @@ mod tests { | |
clean_url, | ||
clean_urls_in_text, | ||
is_url_blocked, | ||
is_valid_actor_name, | ||
is_valid_bio_field, | ||
is_valid_community_name, | ||
is_valid_display_name, | ||
is_valid_matrix_id, | ||
is_valid_post_title, | ||
is_valid_url, | ||
is_valid_username, | ||
site_name_length_check, | ||
site_or_community_description_length_check, | ||
BIO_MAX_LENGTH, | ||
|
@@ -421,23 +432,62 @@ mod tests { | |
} | ||
|
||
#[test] | ||
fn test_valid_actor_name() { | ||
fn test_valid_username() { | ||
let actor_name_max_length = 20; | ||
assert!(is_valid_actor_name("Hello_98", actor_name_max_length).is_ok()); | ||
assert!(is_valid_actor_name("ten", actor_name_max_length).is_ok()); | ||
assert!(is_valid_actor_name("تجريب", actor_name_max_length).is_ok()); | ||
assert!(is_valid_actor_name("تجريب_123", actor_name_max_length).is_ok()); | ||
assert!(is_valid_actor_name("Владимир", actor_name_max_length).is_ok()); | ||
assert!(is_valid_username("Hello_98", actor_name_max_length).is_ok()); | ||
assert!(is_valid_username("ten", actor_name_max_length).is_ok()); | ||
assert!(is_valid_username("تجريب", actor_name_max_length).is_ok()); | ||
assert!(is_valid_username("تجريب_123", actor_name_max_length).is_ok()); | ||
assert!(is_valid_username("Владимир", actor_name_max_length).is_ok()); | ||
|
||
// mixed scripts | ||
assert!(is_valid_actor_name("تجريب_abc", actor_name_max_length).is_err()); | ||
assert!(is_valid_actor_name("Влад_abc", actor_name_max_length).is_err()); | ||
assert!(is_valid_username("تجريب_abc", actor_name_max_length).is_err()); | ||
assert!(is_valid_username("Влад_abc", actor_name_max_length).is_err()); | ||
// dash | ||
assert!(is_valid_actor_name("Hello-98", actor_name_max_length).is_err()); | ||
assert!(is_valid_username("Hello-98", actor_name_max_length).is_err()); | ||
// too short | ||
assert!(is_valid_actor_name("a", actor_name_max_length).is_err()); | ||
assert!(is_valid_username("a", actor_name_max_length).is_err()); | ||
// empty | ||
assert!(is_valid_actor_name("", actor_name_max_length).is_err()); | ||
assert!(is_valid_username("", actor_name_max_length).is_err()); | ||
// newline | ||
assert!(is_valid_username( | ||
r"Line1 | ||
|
||
Line3", | ||
actor_name_max_length | ||
) | ||
.is_err()); | ||
} | ||
|
||
#[test] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx for adding these tests. I know the regex should reject newlines now, but it'd probably be good to add tests for |
||
fn test_valid_community_name() { | ||
let actor_name_max_length = 20; | ||
assert!(is_valid_community_name("hello_98", actor_name_max_length).is_ok()); | ||
assert!(is_valid_community_name("ten", actor_name_max_length).is_ok()); | ||
assert!(is_valid_community_name("تجريب", actor_name_max_length).is_ok()); | ||
assert!(is_valid_community_name("تجريب_123", actor_name_max_length).is_ok()); | ||
assert!(is_valid_community_name("владимир", actor_name_max_length).is_ok()); | ||
|
||
// uppercase | ||
assert!(is_valid_community_name("Ten", actor_name_max_length).is_err()); | ||
assert!(is_valid_community_name("Владимир", actor_name_max_length).is_err()); | ||
// mixed scripts | ||
assert!(is_valid_community_name("تجريب_abc", actor_name_max_length).is_err()); | ||
assert!(is_valid_community_name("Влад_abc", actor_name_max_length).is_err()); | ||
// dash | ||
assert!(is_valid_community_name("hello-98", actor_name_max_length).is_err()); | ||
// too short | ||
assert!(is_valid_community_name("a", actor_name_max_length).is_err()); | ||
// empty | ||
assert!(is_valid_community_name("", actor_name_max_length).is_err()); | ||
// newline | ||
assert!(is_valid_community_name( | ||
r"Line1 | ||
|
||
Line3", | ||
actor_name_max_length | ||
) | ||
.is_err()); | ||
} | ||
|
||
#[test] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently not using
\w
or\d
because the regex crate adds non-ASCII meaning to them. Let me know if this is important to consider for an allowable name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex would be simpler by doing the length check separately with
min_length_check()
.