Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Hand-written MDTest parser #15926

Merged
merged 6 commits into from
Feb 4, 2025
Merged

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Feb 4, 2025

Summary

Replaces our existing Markdown test parser with a fully hand-written parser. I tried to fix this bug using the old approach and kept running into problems. Eventually this seemed like the easier way. It's more code (+50 lines, excluding the new test), but I hope it's relatively straightforward to understand, compared to the complex interplay between the byte-stream-manipulation and regex-parsing that we had before.

I did not really focus on performance, as the parsing time does not dominate the test execution time, but this seems to be slightly faster than what we had before (executing all MD tests; debug):

Command Mean [s] Min [s] Max [s] Relative
this branch 2.775 ± 0.072 2.690 2.877 1.00
main 2.921 ± 0.034 2.865 2.967 1.05 ± 0.03

closes #15923

Test Plan

One new regression test.

@sharkdp sharkdp added testing Related to testing Ruff itself red-knot Multi-file analysis & type inference labels Feb 4, 2025
Comment on lines 341 to 344
if code.ends_with('\n') {
code = &code[..code.len() - 1];
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that we could remove. It seems strange that we trim trailing newlines of code blocks. But I didn't want to change the behavior in this PR.

@sharkdp sharkdp force-pushed the david/mdtest-parser-rewrite branch from 1aee7ac to 38321c9 Compare February 4, 2025 11:25
self.cursor.bump();

if self.preceding_blank_lines < 1 && self.explicit_path.is_none() {
bail!("Code blocks must start on a new line and be preceded by at least one blank line.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message could be improved/adjusted according to the logic in the if condition, but I didn't want to introduce and behavioral changes here.

@sharkdp sharkdp force-pushed the david/mdtest-parser-rewrite branch from 38321c9 to 987ea75 Compare February 4, 2025 11:29
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. No more regex :)

crates/red_knot_test/src/parser.rs Show resolved Hide resolved
crates/red_knot_test/src/parser.rs Outdated Show resolved Hide resolved
crates/red_knot_test/src/parser.rs Outdated Show resolved Hide resolved
crates/red_knot_test/src/parser.rs Outdated Show resolved Hide resolved
crates/red_knot_test/src/parser.rs Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Feb 4, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@sharkdp sharkdp merged commit 9a33924 into main Feb 4, 2025
21 checks passed
@sharkdp sharkdp deleted the david/mdtest-parser-rewrite branch February 4, 2025 13:01
@@ -92,6 +92,18 @@ impl<'a> Cursor<'a> {
}
}

/// Eats the next two characters if they are `c1` and `c2`. Does not
/// consume any input otherwise, even if the first character matches.
pub fn eat_char2(&mut self, c1: char, c2: char) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I wasn't aware that eat_char2 is only available on the ruff_python_parser Cursor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. And I didn't notice that eat_char2 was available somewhere. My implementation is slightly different:

pub fn eat_char2(&mut self, c1: char, c2: char) -> bool {
        if self.first() == c1 && self.second() == c2 {
            self.bump();
            self.bump();
            true
        } else {
            false
        }
    }

vs the existing:

pub(super) fn eat_char2(&mut self, c1: char, c2: char) -> bool {
        let mut chars = self.chars.clone();
        if chars.next() == Some(c1) && chars.next() == Some(c2) {
            self.bump();
            self.bump();
            true
        } else {
            false
        }
    }

Should I do anything about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. The second might be slightly faster but then, it's used rarely.

sharkdp added a commit that referenced this pull request Feb 4, 2025
…omment test (#15933)

## Summary

I experimented with [not trimming trailing newlines in code
snippets](#15926 (comment)),
but since came to the conclusion that the current behavior is better
because otherwise, there is no way to write snippets without a trailing
newline at all. And when you copy the code from a Markdown snippet in
GitHub, you also don't get a trailing newline.

I was surprised to see some test failures when I played with this
though, and decided to make this test independent from this
implementation detail.
dcreager added a commit that referenced this pull request Feb 4, 2025
* main: (66 commits)
  [red-knot] Use ternary decision diagrams (TDDs) for visibility constraints (#15861)
  [`pyupgrade`] Rename private type parameters in PEP 695 generics (`UP049`) (#15862)
  Simplify the `StringFlags` trait (#15944)
  [`flake8-pyi`] Make `PYI019` autofixable for `.py` files in preview mode as well as stubs (#15889)
  Docs (`linter.md`): clarify that Python files are always searched for in subdirectories (#15882)
  [`flake8-pyi`] Make PEP-695 functions with multiple type parameters fixable by PYI019 again (#15938)
  [red-knot] Use unambiguous invalid-syntax-construct for suppression comment test (#15933)
  Make `Binding::range()` point to the range of a type parameter's name, not the full type parameter (#15935)
  Update black deviations (#15928)
  [red-knot] MDTest: Fix line numbers in error messages (#15932)
  Preserve triple quotes and prefixes for strings (#15818)
  [red-knot] Hand-written MDTest parser (#15926)
  [`pylint`] Fix missing parens in unsafe fix for `unnecessary-dunder-call` (`PLC2801`) (#15762)
  nit: docs for ignore & select (#15883)
  [airflow] `BashOperator` has been moved to `airflow.providers.standard.operators.bash.BashOperator` (AIR302) (#15922)
  [`flake8-logging`] `.exception()` and `exc_info=` outside exception handlers (`LOG004`, `LOG014`) (#15799)
  [red-knot] Enforce specifying paths for mdtest code blocks in a separate preceding line (#15890)
  [red-knot] Internal refactoring of visibility constraints API (#15913)
  [red-knot] Implicit instance attributes (#15811)
  [`flake8-comprehensions`] Handle extraneous parentheses around list comprehension (`C403`) (#15877)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference testing Related to testing Ruff itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[red-knot] MDTest parsing problems
2 participants