-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Expose raw tag markup to support existing filters #5143
base: trunk
Are you sure you want to change the base?
Changes from all commits
0e6201a
2719cfe
7d43d42
7151083
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -711,6 +711,41 @@ public function release_bookmark( $name ) { | |
} | ||
|
||
|
||
/** | ||
* Allow internal WordPress code to extract raw HTML snippets associated with a matched tag. | ||
* | ||
* This is meant for internal use to support existing filters that pass segments of input HTML | ||
* for further analysis. Filters which rely on these values should be updated to interact with | ||
* the structural methods this class provides. | ||
* | ||
* @access private | ||
* | ||
* @since 6.4.0 | ||
* | ||
* @param string $internal_use_only Required to accept liability for exposing system internals through this function. | ||
* @param string $content What portion of the raw HTML markup to return: 'entire-tag' or 'only-attributes'. | ||
* @return string|null Raw HTML markup from input document for matched tag, if matched, else NULL. | ||
*/ | ||
public function _wp_internal_extract_raw_token_markup( $internal_use_only, $contents = 'entire-tag' ) { | ||
$disclaimer = <<<INTERNAL_ONLY | ||
I understand that this is only for internal WordPress usage and something | ||
will likely break if used elsewhere. This function comes with no warranty. | ||
INTERNAL_ONLY; | ||
|
||
if ( $internal_use_only !== $disclaimer || null === $this->tag_name_starts_at ) { | ||
Comment on lines
+730
to
+735
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an interesting approach. I haven't seen how we do things like this in WP - is this the first function of such nature? Another potential idea is to prefix such functions with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gutenberg uses this in some places. we considered this is a method I think I would prefer we never merge, and instead of supporting these old filters we move to new ones entirely. if we do put these in I want to discourage people as much as possible to call them separately, but we need to make the methods public for Core's internal consumption. |
||
return null; | ||
} | ||
Comment on lines
+730
to
+737
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not following the meaning of your emoji. The point I'm trying to make here is that I really want the API to discourage use of this function. It's dangerous because it carries the legacy of interacting with HTML as a string that led to all the issues that brought us here. In fact the only reason I'm considering this is a purely pragmatic backwards-compatibility issue. Core (and Jetpack and many other popular plugins) expose full or near-full HTML syntax through filters so that extending code can do further processing. For example, a full link HTML is passed into We could reconstruct normative HTML for these filters, but that would involve iterating over the attributes to create strings when we have one already. Maybe it's a worthwhile tradeoff, but I think most of those filters are unused and so if we can pass along a substring of an existing string and not touch it, that will have far less impact. I'd prefer we find ways to update existing code to work structurally instead stringily. We don't need this function today; it's merely an aid to refactors that need the HTML API. If we can deprecate all those filters and leave time to update, then we can avoid this entire thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot for the added context and explanation, much appreciated! The emoji meant something to the effect, "Wowzers, this one blows the one we use for the constructor out of the water. I guess that's the only precedent we have, but it's probably warranted." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well I did realize that I can't make a class const private, so the one in the constructor could be easily foiled by calling In that case I'm less worried anyway because it's there to alert someone why something broke. In this case it's to try and put pressure against dangerous programming habits that will work if attempted, at least to a point. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the detailed explanation, @dmsnell. Could you provide an example snippet used in Jetpack (or other popular plugin), and how that code should look like after this patch? Just to get a better sense of the changes introduced here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bor0 there is no code change required by this PR for other plugins and code authors. the point is to preserve backwards compatibility. however, here is one of many examples of how existing filters rely on the raw markup. // formatting.php
$rel = apply_filters( 'wp_targeted_link_rel', 'noopener', $link_html ); here, the lots of existing code then parses this with a regex or string-find, and breaks down in the same cases that normal HTML parsing fails: boolean attribute values, different quoting than expected, duplicate or differently-ordered attributes, escaping issues, etc… ideally we would move Core towards a future where instead of passing around these strings we could pass around a kind of read-only or otherwise limited HTML Tag Processor, so that filters could reliably read attribute values, but changing the filter would mean breaking those existing plugins |
||
|
||
switch ( $contents ) { | ||
case 'entire-tag': | ||
return substr( $this->html, $this->tag_name_starts_at - 1, $this->tag_ends_at - $this->tag_name_starts_at + 2 ); | ||
|
||
case 'only-attributes': | ||
return substr( $this->html, $this->tag_name_starts_at + $this->tag_name_length + 1, $this->tag_ends_at - $this->tag_name_starts_at - $this->tag_name_length - 1 ); | ||
} | ||
} | ||
|
||
|
||
/** | ||
* Skips contents of title and textarea tags. | ||
* | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
<?php | ||
/** | ||
* Unit tests covering WP_HTML_Tag_Processor internal helpers. | ||
* | ||
* @package WordPress | ||
* @subpackage HTML-API | ||
*/ | ||
|
||
/** | ||
* @group html-api | ||
* | ||
* @coversDefaultClass WP_HTML_Tag_Processor | ||
*/ | ||
class Tests_HtmlApi_WpHtmlTagProcessor_Internals extends WP_UnitTestCase { | ||
/** | ||
* @ticket 59291 | ||
* | ||
* @dataProvider data_html_with_entire_tag_raw_markup | ||
* | ||
* @param string $html_with_target_element HTML with a tag containing "target" attribute. | ||
* @param string $expected_raw_markup Expected full raw markup for targeted tag. | ||
*/ | ||
public function test_returns_raw_html_markup_for_entire_tag( $html_with_target_element, $expected_raw_markup ) { | ||
$processor = new WP_HTML_Tag_Processor( $html_with_target_element ); | ||
while ( $processor->next_tag() && null === $processor->get_attribute( 'target' ) ) { | ||
continue; | ||
} | ||
|
||
$this->assertSame( | ||
$expected_raw_markup, | ||
$processor->_wp_internal_extract_raw_token_markup( | ||
<<<INTERNAL_ONLY | ||
I understand that this is only for internal WordPress usage and something | ||
will likely break if used elsewhere. This function comes with no warranty. | ||
INTERNAL_ONLY | ||
, | ||
'entire-tag' | ||
), | ||
'Failed to exactly extract raw HTML for matched tag.' | ||
); | ||
} | ||
|
||
/** | ||
* Data provider. | ||
* | ||
* @return array[]. | ||
*/ | ||
public function data_html_with_entire_tag_raw_markup() { | ||
return array( | ||
'Tag at start of string' => array( '<img target src="atat.png">', '<img target src="atat.png">' ), | ||
'Tag in middle of string' => array( '<a href="#"><img target src="atat.png"></a>', '<img target src="atat.png">' ), | ||
'Tag at end of string' => array( '<a href="#"><img src="atat.png"><div id="5" target class="dingy">', '<div id="5" target class="dingy">' ), | ||
'Tab after tag name' => array( "<img\ttarget>", "<img\ttarget>" ), | ||
'Space after attributes' => array( '<img target >', '<img target >' ), | ||
'Unquoted attribute' => array( '<span id = 5 class=sunshine target>', '<span id = 5 class=sunshine target>' ), | ||
'Value with character references' => array( '<a href="#" title="This is > That" target>', '<a href="#" title="This is > That" target>' ), | ||
); | ||
} | ||
|
||
/** | ||
* @ticket 59291 | ||
* | ||
* @dataProvider data_html_with_only_attributes_raw_markup | ||
* | ||
* @param string $html_with_target_element HTML with a tag containing "target" attribute. | ||
* @param string $expected_raw_markup Expected raw markup for targeted tag containing only the attributes. | ||
*/ | ||
public function test_returns_raw_html_markup_for_only_attributes( $html_with_target_element, $expected_raw_markup ) { | ||
$processor = new WP_HTML_Tag_Processor( $html_with_target_element ); | ||
while ( $processor->next_tag() && null === $processor->get_attribute( 'target' ) ) { | ||
continue; | ||
} | ||
|
||
$this->assertSame( | ||
$expected_raw_markup, | ||
$processor->_wp_internal_extract_raw_token_markup( | ||
<<<INTERNAL_ONLY | ||
I understand that this is only for internal WordPress usage and something | ||
will likely break if used elsewhere. This function comes with no warranty. | ||
INTERNAL_ONLY | ||
, | ||
'only-attributes' | ||
), | ||
'Failed to exactly extract raw HTML for matched tag.' | ||
); | ||
} | ||
|
||
/** | ||
* Data provider. | ||
* | ||
* @return array[]. | ||
*/ | ||
public function data_html_with_only_attributes_raw_markup() { | ||
return array( | ||
'Tag at start of string' => array( '<img target src="atat.png">', 'target src="atat.png"' ), | ||
'Tag in middle of string' => array( '<a href="#"><img target src="atat.png"></a>', 'target src="atat.png"' ), | ||
'Tag at end of string' => array( '<a href="#"><img src="atat.png"><div id="5" target class="dingy">', 'id="5" target class="dingy"' ), | ||
'Tab after tag name' => array( "<img\ttarget>", 'target' ), | ||
'Space after attributes' => array( '<img target >', 'target ' ), | ||
'Unquoted attribute' => array( '<span id = 5 class=sunshine target>', 'id = 5 class=sunshine target' ), | ||
'Value with character references' => array( '<a href="#" title="This is > That" target>', 'href="#" title="This is > That" target' ), | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it only necessary to check for
tag_name_starts_at
? If that property is set, does it follow thattag_ends_at
is set too? If not, we could add the check here, but if it does, adding a clarifying (assertion) comment would improve code readability.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct, and as internal values this isn't exposed. it's an idiom used throughout the Tag Processor. I've thought about different means to check what we're doing, but I don't think adding the same comment to twenty different places within the class would be that useful.
I do agree this is a bit indirect; more or less though I'm waiting until a much clearer and better solution appears.