-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882
Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882
Conversation
|
It would also be good to test triple-quoted strings whose final characters match the outer quotation marks: r"""\""""
r'''\'''' The |
I missed adding the invalid characters to the final test file.
Good points, thank you! |
My previous example was incomplete: quotation marks in triple-quoted strings need escaping when they precede two more of the same quotation mark. This can also happen in the middle of a string: r"""\"""..."""
r'''\'''...''' |
705ac51
to
d4051f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution.
I remain hesitant adding raw to regular string conversion considering how rare raw strings is. The goal with our fixes is to fix the majority of cases. It's okay if Ruff can't fix all cases. In this case, being able to fix 1-2 raw strings doesn't justify the added complexity.
My suggestion is that we simply don't provide a fix if the string is a raw-string or that we make it an unsafe fix.
&& string_content | ||
.as_bytes() | ||
.get(column + 1) | ||
.is_some_and(|c2| char::from(*c2) != c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bytes to get the next characters panics if the next character is a non-ASCII character. We should use the chars
iterator instead (they're cheap clonable)
for (column, match_) in prefix.match_indices(&['r', 'R']) { | ||
let c = match_.chars().next().unwrap(); | ||
|
||
let entire_string_range = match kind { | ||
TokenKind::String => range, | ||
_ => last_fstring_start.unwrap().range(), | ||
}; | ||
let location = entire_string_range.start() + TextSize::try_from(column).unwrap(); | ||
let range = TextRange::at(location, c.text_len()); | ||
|
||
string_conversion_edits.push(Edit::range_deletion(range)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: there can always only be at most one r
or R
prefix. That's why it should not be necessaryto iterate, instead you can use find
(or is it position?) to get the position of the r
or R
character
TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(), | ||
), | ||
_ => (0.into(), text.len().try_into().unwrap()), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use text_len
and return a TextRange
. Returning a text range has the advantage that you can index directly by range
let string_content = &text[range];
TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(), | |
), | |
_ => (0.into(), text.len().try_into().unwrap()), | |
}; | |
text.text_len() - string_flags.quote_len(), | |
), | |
_ => TextRange::new((0.into(), text.text_len()), | |
}; |
diagnostic: Diagnostic::new(rule, range), | ||
// This is integrated with other fixes and attached to the diagnostic below. | ||
edit: Edit::range_replacement(replacement.to_string(), range), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to handle the case where the invalid character was the last character before the quotes in a triple quoted strings:
"""test"<invalid>"""
Let's say <invalid>
is the invalid character. Removing <invalid>
then results in """test""""
which is not a valid non-raw strings.
I kind of agree… imho, the fix should be marked unsafe for raw strings. I can open a new PR for that if preferred. Regardless, this was a good excuse to learn Rust better :) |
I think the rules should either offer safe fixes for raw strings or no fixes. Keeping the current fixes but marking them unsafe doesn’t seem useful, because they are incorrect. |
I agree. We should not offer fixes if we know they're incorrect. We could consider offering fixes for raw-strings if they don't contain any quotes or backslashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make our life easier and not offer a fix instead as this seems uncommon enough.
Sorry for leaving this hanging. I've intended to scrap this work in favour of not offering to fix raw strings, but life got in the way. Happy to revise this next month. |
No worries and thanks for the heads up. There are no expectations at all. I only requested changes to make it clear that this PR isn't waiting on me. |
@ThatsJustCheesy would you like me to close this PR so you can make a new one with the simpler "skip it" heuristic? That way, too, if we ever decide to revisit fixing raw strings, we could revive this branch and its logic more easily. What do you think? |
@ThatsJustCheesy I'm gonna go ahead and close this for now, since it sounds like the direction has changed. But please don't hesitate to ping me if you'd like me to re-open it! |
## Summary Resolves #13294, follow-up to #13882. At #13882, it was concluded that a fix should not be offered for raw strings. This change implements that. The five rules in question are now no longer always fixable. ## Test Plan `cargo nextest run` and `cargo insta test`. --------- Co-authored-by: Micha Reiser <[email protected]>
Summary
This aims to resolve #13294 by implementing raw to non-raw string conversion, as suggested by @dscorbett. The conversion is comprehensive (as far as I'm aware), but it does introduce an unfortunate amount of complexity to the rule.
I'm new to this codebase and not a Rust expert, so my code might not be idiomatic.
Test Plan
I have added the following test cases: