-
Notifications
You must be signed in to change notification settings - Fork 25
Feat: Support <script>
#18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
justOneChar : Parser String | ||
justOneChar = | ||
Parser.loop () <| | ||
\_ -> | ||
Parser.chompIf (always True) | ||
|> Parser.getChompedString | ||
|> Parser.map Parser.Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "consuming just one whatever character (after backslash escape)" is somewhat rough, but couldn't come up with better impl
|
||
|
||
javaScriptStringLike : Char -> Parser String | ||
javaScriptStringLike terminatorChar = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly based on https://github.com/elm/parser/blob/master/examples/DoubleQuoteString.elm
Good stuff, @ymtszw. I used your work in my parser: https://github.com/danneu/elm-html-parser. I believe the only change I made was in stringHelp terminatorChar terminatorStr acc =
Parser.oneOf
[ Parser.succeed (\char -> Parser.Loop (acc ++ "\\" ++ char))
|. Parser.token "\\"
|= justOneChar
, Parser.token terminatorStr
|> Parser.map (\_ -> Parser.Done acc)
- , Parser.chompWhile (\char -> char /= '\\' && char /= terminatorChar)
+ , chompOneOrMore (\char -> char /= '\\' && char /= terminatorChar)
|> Parser.getChompedString
|> Parser.map (\chunk -> Parser.Loop (acc ++ chunk))
] Since Just wanted to thank you for your work. |
I see. Will add test cases if I got time |
@@ -240,6 +348,8 @@ errorTests = | |||
, test "wrong DOCTYPE keyword" (testDocumentError "<!DOCTYRP html><html></html>") | |||
, test "wrong DOCTYPE" (testDocumentError "<!DOCTYPE httl><html></html>") | |||
, test "wrong html tag" (testDocumentError "<!DOCTYPE html><document></document>") | |||
, test "incomplete script1" (testDocumentError "<script>") | |||
, test "incomplete script2 (PR#18 comment)" (testDocumentError "<script>'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danneu Added tests for your cases.
Though I observed that without the change to chompOneOrMore
this test still passes without infinite loop.
Nevertheless, accepting champOneOrMore
does not harm the rest of the suites so I made the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into it. Infinite loops seem to be pretty easy for me to create with elm/parser so I'm not surprised that it's a me problem. :)
Hi @hecrj !
I was trying to utilize this package to parse external HTML and possibly generate link preview metadata from OpenGraph/Twitter Card meta tags.
Then I had found that parsing can oftentimes fail due to not-yet-supported
<script>
tags.I thought about light-weight workarounds, but in the end, found out that contributing to support it properly is actually faster! So here it goes.
The implementation is not perfectly based on HTML standard, but I had checked relavant documents frequently while implementing this, so it should do not-so-bad. Added real world test case too.
Ping me anytime on discussion about this patch, on the GitHub, or Slack. Thanks in advance!