-
-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Inside comments, <<
is parsed as <!
#325
Comments
I'm interested in making a PR to make comment tokenisation spec-compliant in general, but I can't get the tests to run. Notes I took while trying to run testsHere's the story of me trying to do that: After
The only reference to such a thing in diff --git a/scripts/generate-parser-feedback-test/index.js b/scripts/generate-parser-feedback-test/index.js
index 0c128d9..b793c13 100644
--- a/scripts/generate-parser-feedback-test/index.js
+++ b/scripts/generate-parser-feedback-test/index.js
@@ -6,7 +6,7 @@ const Tokenizer = require('../../packages/parse5/lib/tokenizer');
const defaultTreeAdapter = require('../../packages/parse5/lib/tree-adapters/default');
const { convertTokenToHtml5Lib } = require('../../test/utils/generate-tokenization-tests');
const parseDatFile = require('../../test/utils/parse-dat-file');
-const { addSlashes } = require('../../test/test/utils/common');
+const { addSlashes } = require('../../test/utils/common');
readFile = promisify(readFile);
writeFile = promisify(writeFile);
That fixed, the script exits
So I'm guessing this script doesn't generate those files, but consumes them.
They're at such a different path that it can't be a typo. I don't know where else to look for missing mystery test data files. At this point I gave up. Advice appreciated. |
@anko it's indeed a bug, you're right we append the wrong character here: https://github.com/inikulin/parse5/blob/master/packages/parse5/lib/tokenizer/index.js#L1471 It should be Regarding tests - you need to fetch html5lib git submodule by running:
|
I see; tests are in a separate repo. Looks like they cover most of the comment spec stuff too. I'd feel dumb PRing for a 1-char diff, and it doesn't feel worthwhile even writing a regression test when the wrongness is unsubtle. Poke it right when you have time I guess. |
Don't see anything dumb about it. And it's always worth writing a regression test. |
Inside comments two consecutive less-than characters (`<<`) parsed wrongly as `<!`, due to what was probably a typo. This fixes that. Added regression test. Fixes inikulin#325.
Inside comments two consecutive less-than characters (`<<`) parsed wrongly as `<!`, due to what was probably a typo. This fixes that. Added regression test. Fixes #325.
Inside comments two consecutive less-than characters (`<<`) parsed wrongly as `<!`, due to what was probably a typo. This fixes that. Added regression test. Fixes inikulin#325.
Module: [email protected]
Repro steps (Linux):
Expected:
test <<
, nottest <!
.Rationale: I find the above behaviour confusing because the HTML spec on comments does not limit how
<
can be used inside comments. A comment containing 2 consecutive less-than signs should be legal, but is currently unrepresentable.Analysis:
I've just walked into the source and don't know the details, but it appears wrong to me that the tokeniser switches state from
COMMENT_STATE
toCOMMENT_LESS_THAN_SIGN_STATE
when encountering<
.COMMENT_LESS_THAN_SIGN_STATE
then treats<
as!
and causes the weird output seen above.Surely
COMMENT_STATE
represents the state where we're inside the text part of the comment? The only non-error references out of there should be to itself (parsing more content) or toCOMMENT_END_DASH_STATE
, right?The text was updated successfully, but these errors were encountered: