Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot determine if whitespace exists between nodes #137

Closed
nonara opened this issue Jul 11, 2021 · 0 comments
Closed

Cannot determine if whitespace exists between nodes #137

nonara opened this issue Jul 11, 2021 · 0 comments

Comments

@nonara
Copy link
Collaborator

nonara commented Jul 11, 2021

Issue

Assuming:

<span>test1</span> <span>test2</span>
<span>test3</span>
<span>test4</span>

In browsers, this is rendered as:


test1 test2 test3 test4


However, it is rendered by the parser as test1test2test3test4

The deeper issue is that whitespace between nodes is not being recorded or indicated in any way.

Solutions

Playing around with this on astexplorer.net shows that most parsers (ie. htmlparser2, parse5, etc) create a TextNode for the whitespace.

What's interesting, however, is that Angular takes a more intelligent route, which is likely faster. Like node-html-parser, it does not create a TextNode for these. Instead, it allows users to determine for themselves via the range information attached to each node.

The range information, offered by most parsers, is simply the specific index for where a node begins and ends. Specifically, these positions are for the first char of the opening tag and the last of the closing tag, respectively.

Proposed solution

I propose simply adding a range array to each node, per convention. In so doing, we are able to determine whether a node has trailing whitespace.

For example:

<!-- The following nodes have contiguous ranges. The ranges are [ 0, 17 ] and [ 17, 35 ], respectively. -->
<!-- When we compare the end of the first node (17) with the start of the next (17), we can see there is no space -->
<span>text1</span><span>text2</span>

<!-- These nodes, however are non-contiguous. The ranges are [ 0, 17 ] and [ 18, 37 ], respectively. -->
<!-- By comparing the end and start locations, we know that there is at least one whitespace char between them -->
<span>text1</span>  <span>text2</span>

I am submitting a PR shortly.

Related Issue

crosstype/node-html-markdown#16

nonara added a commit to nonara/node-html-parser that referenced this issue Jul 12, 2021
nonara added a commit to nonara/node-html-parser that referenced this issue Jul 12, 2021
@taoqf taoqf closed this as completed in a64f336 Jul 13, 2021
taoqf added a commit that referenced this issue Jul 13, 2021
feat: Add range to nodes and fix whitespace issue (fixes #137)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant