Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange template parsing bug in deeply nested scenario #313

Closed
davidebbo opened this issue Dec 29, 2023 · 3 comments
Closed

Strange template parsing bug in deeply nested scenario #313

davidebbo opened this issue Dec 29, 2023 · 3 comments

Comments

@davidebbo
Copy link
Contributor

davidebbo commented Dec 29, 2023

Using version 0.6.5.

I'm running into a puzzling issue in a nested scenario. The code below is a complete repro case. For some reason, it thinks that p15 and p21 are properties on the same template, when p21 belongs to a child template.

Interesting observations:

  • If you remove one level of nesting (e.g. delete the p1 line, and a delete }} at the end), it no longer happens
  • If you change |p20={{tpl }} to |p20=1 (a non-template prop), it no longer happens

I started from a far more complex repro case, and this is as much as I could simplify it.

import mwparserfromhell

input = """
{{tpl
  |p1={{tpl
    |p2={{tpl
      |p3={{tpl
        |p4={{tpl
          |p5={{tpl
            |p6={{tpl
              |p7={{tpl
                |p8={{tpl
                  |p9={{tpl
                    |p10={{tpl
                      |p11={{tpl
                        |p12={{tpl
                          |p13={{tpl
                            |p14={{tpl
                              |p15={{tpl
                                |p16={{tpl
                                  |p17={{tpl
                                    |p18={{tpl
                                      |p19={{tpl
                                        |p20={{tpl }}
                                      }}
                                    }}
                                  }}
                                }}
                                |p21=1
                                }}
                              |p22=1}} }} }} }} }} }} }} }} }} }} }} }} }} }} }}
"""

wikicode = mwparserfromhell.parse(input)
template = wikicode.filter_templates(matches=lambda t: t.has_param("p15"))[0]

# It ends up with ['p15', 'p21'], which is not correct, as p21 is more deeply nested than p15 (p21 is with p16)
print([p.name for p in template.params])
@davidebbo
Copy link
Contributor Author

davidebbo commented Dec 29, 2023

I debugged it and found the problem: it's exceeding the MAX_DEPTH, which is hard coded at 40, both in the C tokenizer and the Python tokenizer.

Any chance of bumping up the limit, or making it configurable? Note that this is not an artificial scenario, as it is occurring in https://en.wikipedia.org/w/index.php?title=Ankylosauria, in the second cladogram. This particular page would need a depth limit of 60. Thanks!

Side suggestion: it may be best to hard fail by default when the max depth is reached, to make it easier to understand what's going on. But for now I'll be perfectly happy with a limit bump :)

@davidebbo davidebbo changed the title Strange template parsing bug is deeply nested scenario Strange template parsing bug in deeply nested scenario Dec 29, 2023
@davidebbo
Copy link
Contributor Author

I sent a PR and set the limit to 100, which appears to be what mediawiki is now using.

@earwig
Copy link
Owner

earwig commented Jan 3, 2024 via email

@earwig earwig closed this as completed Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants