Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882

Closed

Conversation

ThatsJustCheesy
Copy link

Summary

This aims to resolve #13294 by implementing raw to non-raw string conversion, as suggested by @dscorbett. The conversion is comprehensive (as far as I'm aware), but it does introduce an unfortunate amount of complexity to the rule.

I'm new to this codebase and not a Rust expert, so my code might not be idiomatic.

Test Plan

I have added the following test cases:

raw_single_singlequote = r'\ \' " �'
raw_triple_singlequote = r'''\ ' " �'''
raw_single_doublequote = r"\ ' \" �"
raw_triple_doublequote = r"""\ ' " �"""
raw_single_singlequote_multiline = r'\' \
" \
​'
raw_triple_singlequote_multiline = r'''' \
" \
�'''
raw_single_doublequote_multiline = r"' \
\" \
�"
raw_triple_doublequote_multiline = r"""' \
" \
�"""
raw_nested_fstrings = rf'\ {rf'\ {rf'\' '}'}'

Copy link
Contributor

github-actions bot commented Oct 23, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dscorbett
Copy link

dscorbett commented Oct 23, 2024

It would also be good to test triple-quoted strings whose final characters match the outer quotation marks:

r"""​\""""
r'''​\''''

The raw_nested_fstrings tests don’t include any of the relevant control characters so they are not effective tests for these rules’ fixes.

@ThatsJustCheesy
Copy link
Author

Good points, thank you!

@dscorbett
Copy link

My previous example was incomplete: quotation marks in triple-quoted strings need escaping when they precede two more of the same quotation mark. This can also happen in the middle of a string:

r"""​\"""..."""
r'''​\'''...'''

@ThatsJustCheesy ThatsJustCheesy force-pushed the string-escape-seqs-unraw branch from 705ac51 to d4051f1 Compare October 23, 2024 21:37
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution.

I remain hesitant adding raw to regular string conversion considering how rare raw strings is. The goal with our fixes is to fix the majority of cases. It's okay if Ruff can't fix all cases. In this case, being able to fix 1-2 raw strings doesn't justify the added complexity.

My suggestion is that we simply don't provide a fix if the string is a raw-string or that we make it an unsafe fix.

Comment on lines +273 to +276
&& string_content
.as_bytes()
.get(column + 1)
.is_some_and(|c2| char::from(*c2) != c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bytes to get the next characters panics if the next character is a non-ASCII character. We should use the chars iterator instead (they're cheap clonable)

Comment on lines +241 to +252
for (column, match_) in prefix.match_indices(&['r', 'R']) {
let c = match_.chars().next().unwrap();

let entire_string_range = match kind {
TokenKind::String => range,
_ => last_fstring_start.unwrap().range(),
};
let location = entire_string_range.start() + TextSize::try_from(column).unwrap();
let range = TextRange::at(location, c.text_len());

string_conversion_edits.push(Edit::range_deletion(range));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there can always only be at most one r or R prefix. That's why it should not be necessaryto iterate, instead you can use find (or is it position?) to get the position of the r or R character

Comment on lines +258 to +261
TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(),
),
_ => (0.into(), text.len().try_into().unwrap()),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use text_len and return a TextRange. Returning a text range has the advantage that you can index directly by range

let string_content = &text[range];

Suggested change
TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(),
),
_ => (0.into(), text.len().try_into().unwrap()),
};
text.text_len() - string_flags.quote_len(),
),
_ => TextRange::new((0.into(), text.text_len()),
};

Comment on lines +223 to +226
diagnostic: Diagnostic::new(rule, range),
// This is integrated with other fixes and attached to the diagnostic below.
edit: Edit::range_replacement(replacement.to_string(), range),
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to handle the case where the invalid character was the last character before the quotes in a triple quoted strings:

"""test"<invalid>"""

Let's say <invalid> is the invalid character. Removing <invalid> then results in """test"""" which is not a valid non-raw strings.

@ThatsJustCheesy
Copy link
Author

In this case, being able to fix 1-2 raw strings doesn't justify the added complexity.

I kind of agree… imho, the fix should be marked unsafe for raw strings. I can open a new PR for that if preferred.

Regardless, this was a good excuse to learn Rust better :)

@dscorbett
Copy link

I think the rules should either offer safe fixes for raw strings or no fixes. Keeping the current fixes but marking them unsafe doesn’t seem useful, because they are incorrect.

@MichaReiser
Copy link
Member

I agree. We should not offer fixes if we know they're incorrect. We could consider offering fixes for raw-strings if they don't contain any quotes or backslashes.

@MichaReiser MichaReiser added the fixes Related to suggested fixes for violations label Nov 25, 2024
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make our life easier and not offer a fix instead as this seems uncommon enough.

@ThatsJustCheesy
Copy link
Author

Sorry for leaving this hanging. I've intended to scrap this work in favour of not offering to fix raw strings, but life got in the way. Happy to revise this next month.

@MichaReiser
Copy link
Member

No worries and thanks for the heads up. There are no expectations at all. I only requested changes to make it clear that this PR isn't waiting on me.

@dylwil3
Copy link
Collaborator

dylwil3 commented Jan 18, 2025

@ThatsJustCheesy would you like me to close this PR so you can make a new one with the simpler "skip it" heuristic? That way, too, if we ever decide to revisit fixing raw strings, we could revive this branch and its logic more easily. What do you think?

@dylwil3
Copy link
Collaborator

dylwil3 commented Jan 24, 2025

@ThatsJustCheesy I'm gonna go ahead and close this for now, since it sounds like the direction has changed. But please don't hesitate to ping me if you'd like me to re-open it!

@dylwil3 dylwil3 closed this Jan 24, 2025
MichaReiser added a commit that referenced this pull request Feb 13, 2025
## Summary

Resolves #13294, follow-up to #13882.

At #13882, it was concluded that a fix should not be offered for raw
strings. This change implements that. The five rules in question are now
no longer always fixable.

## Test Plan

`cargo nextest run` and `cargo insta test`.

---------

Co-authored-by: Micha Reiser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixes Related to suggested fixes for violations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PLE2510, PLE2512, PLE2513, PLE2514, and PLE2515 fixes add escape sequences to raw strings
4 participants