Feat: Support `<script>` #18

ymtszw · 2022-02-19T12:11:21Z

I was trying to utilize this package to parse external HTML and possibly generate link preview metadata from OpenGraph/Twitter Card meta tags.
Then I had found that parsing can oftentimes fail due to not-yet-supported <script> tags.

I thought about light-weight workarounds, but in the end, found out that contributing to support it properly is actually faster! So here it goes.
The implementation is not perfectly based on HTML standard, but I had checked relavant documents frequently while implementing this, so it should do not-so-bad. Added real world test case too.

Ping me anytime on discussion about this patch, on the GitHub, or Slack. Thanks in advance!

ymtszw · 2022-02-19T12:21:47Z

src/Html/Parser.elm

+justOneChar : Parser String
+justOneChar =
+    Parser.loop () <|
+        \_ ->
+            Parser.chompIf (always True)
+                |> Parser.getChompedString
+                |> Parser.map Parser.Done


This "consuming just one whatever character (after backslash escape)" is somewhat rough, but couldn't come up with better impl

ymtszw · 2022-02-19T12:22:21Z

src/Html/Parser.elm

+
+
+javaScriptStringLike : Char -> Parser String
+javaScriptStringLike terminatorChar =


This is mostly based on https://github.com/elm/parser/blob/master/examples/DoubleQuoteString.elm

danneu · 2022-05-17T04:34:54Z

Good stuff, @ymtszw.

I used your work in my parser: https://github.com/danneu/elm-html-parser.

I believe the only change I made was in stringHelp:

stringHelp terminatorChar terminatorStr acc =
    Parser.oneOf
        [ Parser.succeed (\char -> Parser.Loop (acc ++ "\\" ++ char))
            |. Parser.token "\\"
             |= justOneChar
         , Parser.token terminatorStr
             |> Parser.map (\_ -> Parser.Done acc)
-        , Parser.chompWhile (\char -> char /= '\\' && char /= terminatorChar)
+        , chompOneOrMore (\char -> char /= '\\' && char /= terminatorChar)
             |> Parser.getChompedString
             |> Parser.map (\chunk -> Parser.Loop (acc ++ chunk))
         ]

Since chompWhile always succeeds (with 0 chars consumed), I was getting infinite loop on input <script>'. Perhaps you can verify whether this is an issue? I may have changed other things, I don't remember.

Just wanted to thank you for your work.

ymtszw · 2022-05-17T15:25:24Z

I see. Will add test cases if I got time

ymtszw · 2022-05-29T05:18:44Z

tests/Main.elm

@@ -240,6 +348,8 @@ errorTests =
        , test "wrong DOCTYPE keyword" (testDocumentError "<!DOCTYRP html><html></html>")
        , test "wrong DOCTYPE" (testDocumentError "<!DOCTYPE httl><html></html>")
        , test "wrong html tag" (testDocumentError "<!DOCTYPE html><document></document>")
+        , test "incomplete script1" (testDocumentError "<script>")
+        , test "incomplete script2 (PR#18 comment)" (testDocumentError "<script>'")


@danneu Added tests for your cases.
Though I observed that without the change to chompOneOrMore this test still passes without infinite loop.
Nevertheless, accepting champOneOrMore does not harm the rest of the suites so I made the change.

Thanks for looking into it. Infinite loops seem to be pretty easy for me to create with elm/parser so I'm not surprised that it's a me problem. :)

ymtszw added 5 commits February 19, 2022 20:43

test: activate scriptTests and add realWorld1 test

c619086

test: separate Double/Single quote

b1885c3

feat: support consuming <script> as a Text (and Comment)

c2e991e

chore: update README/CHANGELOG

645a07d

fix: typo

08e3e81

ymtszw commented Feb 19, 2022

View reviewed changes

test: PR hecrj#18 comment

1a739f1

ymtszw commented May 29, 2022

View reviewed changes

ymtszw added 2 commits April 9, 2023 02:38

chore: rename due to Main.elm discovery issues in some cases

690b171

fix: inappropriate module name

b89dc29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Support `<script>` #18

Feat: Support `<script>` #18

ymtszw commented Feb 19, 2022

ymtszw Feb 19, 2022

ymtszw Feb 19, 2022

danneu commented May 17, 2022 •

edited

Loading

ymtszw commented May 17, 2022

ymtszw May 29, 2022

danneu May 30, 2022



		javaScriptStringLike : Char -> Parser String
		javaScriptStringLike terminatorChar =

Feat: Support <script> #18

Are you sure you want to change the base?

Feat: Support <script> #18

Conversation

ymtszw commented Feb 19, 2022

ymtszw Feb 19, 2022

Choose a reason for hiding this comment

ymtszw Feb 19, 2022

Choose a reason for hiding this comment

danneu commented May 17, 2022 • edited Loading

ymtszw commented May 17, 2022

ymtszw May 29, 2022

Choose a reason for hiding this comment

danneu May 30, 2022

Choose a reason for hiding this comment

Feat: Support `<script>` #18

Feat: Support `<script>` #18

danneu commented May 17, 2022 •

edited

Loading