Skip to content

Feat: Support <script> #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Feat: Support <script> #18

wants to merge 8 commits into from

Conversation

ymtszw
Copy link

@ymtszw ymtszw commented Feb 19, 2022

Hi @hecrj !

I was trying to utilize this package to parse external HTML and possibly generate link preview metadata from OpenGraph/Twitter Card meta tags.
Then I had found that parsing can oftentimes fail due to not-yet-supported <script> tags.

I thought about light-weight workarounds, but in the end, found out that contributing to support it properly is actually faster! So here it goes.
The implementation is not perfectly based on HTML standard, but I had checked relavant documents frequently while implementing this, so it should do not-so-bad. Added real world test case too.

Ping me anytime on discussion about this patch, on the GitHub, or Slack. Thanks in advance!

Comment on lines +369 to +375
justOneChar : Parser String
justOneChar =
Parser.loop () <|
\_ ->
Parser.chompIf (always True)
|> Parser.getChompedString
|> Parser.map Parser.Done
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "consuming just one whatever character (after backslash escape)" is somewhat rough, but couldn't come up with better impl



javaScriptStringLike : Char -> Parser String
javaScriptStringLike terminatorChar =
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danneu
Copy link

danneu commented May 17, 2022

Good stuff, @ymtszw.

I used your work in my parser: https://github.com/danneu/elm-html-parser.

I believe the only change I made was in stringHelp:

stringHelp terminatorChar terminatorStr acc =
    Parser.oneOf
        [ Parser.succeed (\char -> Parser.Loop (acc ++ "\\" ++ char))
            |. Parser.token "\\"
             |= justOneChar
         , Parser.token terminatorStr
             |> Parser.map (\_ -> Parser.Done acc)
-        , Parser.chompWhile (\char -> char /= '\\' && char /= terminatorChar)
+        , chompOneOrMore (\char -> char /= '\\' && char /= terminatorChar)
             |> Parser.getChompedString
             |> Parser.map (\chunk -> Parser.Loop (acc ++ chunk))
         ]

Since chompWhile always succeeds (with 0 chars consumed), I was getting infinite loop on input <script>'. Perhaps you can verify whether this is an issue? I may have changed other things, I don't remember.

Just wanted to thank you for your work.

@ymtszw
Copy link
Author

ymtszw commented May 17, 2022

I see. Will add test cases if I got time

@@ -240,6 +348,8 @@ errorTests =
, test "wrong DOCTYPE keyword" (testDocumentError "<!DOCTYRP html><html></html>")
, test "wrong DOCTYPE" (testDocumentError "<!DOCTYPE httl><html></html>")
, test "wrong html tag" (testDocumentError "<!DOCTYPE html><document></document>")
, test "incomplete script1" (testDocumentError "<script>")
, test "incomplete script2 (PR#18 comment)" (testDocumentError "<script>'")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danneu Added tests for your cases.
Though I observed that without the change to chompOneOrMore this test still passes without infinite loop.
Nevertheless, accepting champOneOrMore does not harm the rest of the suites so I made the change.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it. Infinite loops seem to be pretty easy for me to create with elm/parser so I'm not surprised that it's a me problem. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants