Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify regex for user/community name validation #5164

Merged
merged 13 commits into from
Nov 12, 2024
4 changes: 2 additions & 2 deletions crates/api_crud/src/community/create.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ use lemmy_utils::{
utils::{
slurs::check_slurs,
validation::{
is_valid_actor_name,
is_valid_body_field,
is_valid_community_name,
site_or_community_description_length_check,
},
},
Expand Down Expand Up @@ -80,7 +80,7 @@ pub async fn create_community(
let banner = diesel_url_create(data.banner.as_deref())?;
let banner = proxy_image_link_api(banner, &context).await?;

is_valid_actor_name(&data.name, local_site.actor_name_max_length as usize)?;
is_valid_community_name(&data.name, local_site.actor_name_max_length as usize)?;

// Double check for duplicate community actor_ids
let community_actor_id = generate_local_apub_endpoint(
Expand Down
4 changes: 2 additions & 2 deletions crates/api_crud/src/user/create.rs
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ use lemmy_utils::{
error::{LemmyError, LemmyErrorExt, LemmyErrorType, LemmyResult},
utils::{
slurs::{check_slurs, check_slurs_opt},
validation::is_valid_actor_name,
validation::is_valid_username,
},
};
use serde::{Deserialize, Serialize};
Expand Down Expand Up @@ -407,7 +407,7 @@ async fn create_person(
context: &Data<LemmyContext>,
) -> Result<Person, LemmyError> {
let actor_keypair = generate_actor_keypair()?;
is_valid_actor_name(&username, local_site.actor_name_max_length as usize)?;
is_valid_username(&username, local_site.actor_name_max_length as usize)?;
let actor_id = generate_local_apub_endpoint(
EndpointType::Person,
&username,
Expand Down
102 changes: 76 additions & 26 deletions crates/utils/src/utils/validation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,27 +84,37 @@ fn has_newline(name: &str) -> bool {
name.contains('\n')
}

pub fn is_valid_actor_name(name: &str, actor_name_max_length: usize) -> LemmyResult<()> {
static VALID_ACTOR_NAME_REGEX_EN: LazyLock<Regex> =
LazyLock::new(|| Regex::new(r"^[a-zA-Z0-9_]{3,}$").expect("compile regex"));
static VALID_ACTOR_NAME_REGEX_AR: LazyLock<Regex> =
LazyLock::new(|| Regex::new(r"^[\p{Arabic}0-9_]{3,}$").expect("compile regex"));
static VALID_ACTOR_NAME_REGEX_RU: LazyLock<Regex> =
LazyLock::new(|| Regex::new(r"^[\p{Cyrillic}0-9_]{3,}$").expect("compile regex"));
pub fn is_valid_username(name: &str, actor_name_max_length: usize) -> LemmyResult<()> {
// Only allow characters from a single alphabet per username. This avoids problems with lookalike
// characters like `o` which looks identical in Latin and Cyrillic, and can be used to imitate
// other users. Checks for additional alphabets can be added in the same way.
static VALID_USERNAME_REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r"^(?:[a-zA-Z0-9_]{3,}|[0-9_\p{Arabic}]{3,}|[0-9_\p{Cyrillic}]{3,})$")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not using \w or \d because the regex crate adds non-ASCII meaning to them. Let me know if this is important to consider for an allowable name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex would be simpler by doing the length check separately with min_length_check().

.expect("compile regex")
});

let check = name.chars().count() <= actor_name_max_length && !has_newline(name);
is_valid_actor_name(name, actor_name_max_length, &VALID_USERNAME_REGEX)
}

pub fn is_valid_community_name(name: &str, actor_name_max_length: usize) -> LemmyResult<()> {
// Only allow characters from a single alphabet per username. This avoids problems with lookalike
// characters like `o` which looks identical in Latin and Cyrillic, and can be used to imitate
// other users. Checks for additional alphabets can be added in the same way.
let lang_check = VALID_ACTOR_NAME_REGEX_EN.is_match(name)
|| VALID_ACTOR_NAME_REGEX_AR.is_match(name)
|| VALID_ACTOR_NAME_REGEX_RU.is_match(name);
static VALID_COMMUNITY_NAME_REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"^(?:[0-9_a-z]{3,}|[0-9_[\p{Arabic}]&&\P{Lu}&&\P{Lt}]{3,}|[0-9_[\p{Cyrillic}&&\P{Lu}&&\P{Lt}]]{3,})$",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitting uppercase and titlecase characters.

Copy link
Member

@Nutomic Nutomic Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should uppercase be allowed for usernames but not for communities? It makes sense that both are consistent, and then we dont need separate regexes, functions and tests.

And same thing about the length check, I would also use .is_lowercase() in Rust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should uppercase be allowed for usernames but not for communities? It makes sense that both are consistent, and then we dont need separate regexes, functions and tests.

I was going based off of the comment change from the related UI PR. Before I made the fix in that PR, the only letters the frontend allowed were lowercase ASCII. I made it so that Arabic and Cyrillic would pass as well, but was still unclear on the lowercase character constraint.

I have no strong opinion on whether community actor names should or shouldn't allow uppercase letters. I just want to make sure that the username and community name validation don't differ between frontend and backend, and I assume the old frontend community name constraint was made with intent.

Copy link
Member

@Nutomic Nutomic Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the change for disallowing uppercase characters, but it was reverted a while later. For api tests we generate random usernames and community names including uppercase characters. So the backend works fine with uppercase, but maybe there would be problems in lemmy-ui. Anyway I cant find any reason why the rules should be different for username and community name, probably just an oversight.

)
.expect("compile regex")
});

if !check || !lang_check {
Err(LemmyErrorType::InvalidName.into())
} else {
is_valid_actor_name(name, actor_name_max_length, &VALID_COMMUNITY_NAME_REGEX)
}

fn is_valid_actor_name(name: &str, actor_name_max_length: usize, r: &Regex) -> LemmyResult<()> {
if name.len() <= actor_name_max_length && r.is_match(name) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The has_newline check ended up being superfluous as it was already covered by the regex.

Ok(())
} else {
Err(LemmyErrorType::InvalidName.into())
}
}

Expand Down Expand Up @@ -362,12 +372,13 @@ mod tests {
clean_url,
clean_urls_in_text,
is_url_blocked,
is_valid_actor_name,
is_valid_bio_field,
is_valid_community_name,
is_valid_display_name,
is_valid_matrix_id,
is_valid_post_title,
is_valid_url,
is_valid_username,
site_name_length_check,
site_or_community_description_length_check,
BIO_MAX_LENGTH,
Expand Down Expand Up @@ -421,23 +432,62 @@ mod tests {
}

#[test]
fn test_valid_actor_name() {
fn test_valid_username() {
let actor_name_max_length = 20;
assert!(is_valid_actor_name("Hello_98", actor_name_max_length).is_ok());
assert!(is_valid_actor_name("ten", actor_name_max_length).is_ok());
assert!(is_valid_actor_name("تجريب", actor_name_max_length).is_ok());
assert!(is_valid_actor_name("تجريب_123", actor_name_max_length).is_ok());
assert!(is_valid_actor_name("Владимир", actor_name_max_length).is_ok());
assert!(is_valid_username("Hello_98", actor_name_max_length).is_ok());
assert!(is_valid_username("ten", actor_name_max_length).is_ok());
assert!(is_valid_username("تجريب", actor_name_max_length).is_ok());
assert!(is_valid_username("تجريب_123", actor_name_max_length).is_ok());
assert!(is_valid_username("Владимир", actor_name_max_length).is_ok());

// mixed scripts
assert!(is_valid_actor_name("تجريب_abc", actor_name_max_length).is_err());
assert!(is_valid_actor_name("Влад_abc", actor_name_max_length).is_err());
assert!(is_valid_username("تجريب_abc", actor_name_max_length).is_err());
assert!(is_valid_username("Влад_abc", actor_name_max_length).is_err());
// dash
assert!(is_valid_actor_name("Hello-98", actor_name_max_length).is_err());
assert!(is_valid_username("Hello-98", actor_name_max_length).is_err());
// too short
assert!(is_valid_actor_name("a", actor_name_max_length).is_err());
assert!(is_valid_username("a", actor_name_max_length).is_err());
// empty
assert!(is_valid_actor_name("", actor_name_max_length).is_err());
assert!(is_valid_username("", actor_name_max_length).is_err());
// newline
assert!(is_valid_username(
r"Line1

Line3",
actor_name_max_length
)
.is_err());
}

#[test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for adding these tests. I know the regex should reject newlines now, but it'd probably be good to add tests for \n just in case.

fn test_valid_community_name() {
let actor_name_max_length = 20;
assert!(is_valid_community_name("hello_98", actor_name_max_length).is_ok());
assert!(is_valid_community_name("ten", actor_name_max_length).is_ok());
assert!(is_valid_community_name("تجريب", actor_name_max_length).is_ok());
assert!(is_valid_community_name("تجريب_123", actor_name_max_length).is_ok());
assert!(is_valid_community_name("владимир", actor_name_max_length).is_ok());

// uppercase
assert!(is_valid_community_name("Ten", actor_name_max_length).is_err());
assert!(is_valid_community_name("Владимир", actor_name_max_length).is_err());
// mixed scripts
assert!(is_valid_community_name("تجريب_abc", actor_name_max_length).is_err());
assert!(is_valid_community_name("Влад_abc", actor_name_max_length).is_err());
// dash
assert!(is_valid_community_name("hello-98", actor_name_max_length).is_err());
// too short
assert!(is_valid_community_name("a", actor_name_max_length).is_err());
// empty
assert!(is_valid_community_name("", actor_name_max_length).is_err());
// newline
assert!(is_valid_community_name(
r"Line1

Line3",
actor_name_max_length
)
.is_err());
}

#[test]
Expand Down