Revisit PHP representation of AMP validator spec #2769

westonruter · 2019-07-08T21:10:47Z

The AMP validator spec is converted from protoascii into PHP via the bin/amphtml-update.py Python script. This file tries to only extract the information that is needed for the tag-and-attribute sanitizer, but still, it is large: 451KB—so large in fact that it crashes phpcs: #2767 (comment). Just requiring this file incurs memory usage of +4MB. We could look at splitting up file by tag so that we only load the data for tags that are actually encountered.

Stepping back a bit further, we should also consider whether the Python script is the best way to extract the validator spec into PHP. It turns out that @fstanis has worked on making available the validator spec in JSON format: ampproject/amphtml#22528. This would mean at the very least that we could rewrite the spec extraction logic in PHP (or even JS) instead of Python (which has the required protobuf library, though there is probably a PHP protobuf library that could have been used instead). In any case, it is much more comfortable to work with JSON than protoascii, as long as there is no loss of fidelity in the conversion, which is done in validator/validator_gen_js.py:

  rules = validator_pb2.ValidatorRules()
  text_format.Merge(open(specfile).read(), rules)
  out.append(json_format.MessageToJson(rules))

Another benefit here is this would avoid us having to download the entire amphtml repo, since the entire spec in JSON format is always available at https://cdn.ampproject.org/v0/validator.json

This JSON file is only ~250KB as opposed to an archive export of the amphtml repo which is 100MB+.

The text was updated successfully, but these errors were encountered:

fstanis · 2019-07-08T21:17:49Z

If you go for a JS solution, ampproject/amp-toolbox#377 might be of interest.

It's meant to be a general purpose querying library for the validator rules. It'll be minimal at first, but you're free to extend it to include the functionality you need.

schlessera · 2019-11-14T13:51:22Z

Related #3730

westonruter · 2021-06-08T01:21:49Z

Implemented in ampproject/amp-toolbox-php#100

westonruter added this to the v2.0 milestone Jul 8, 2019

swissspidy added Enhancement New feature or improvement of an existing one Sanitizers labels Jul 9, 2019

westonruter mentioned this issue Aug 26, 2019

Update allowed tags/attributes from spec in amphtml 1908162134430 #3084

Merged

5 tasks

westonruter mentioned this issue Nov 11, 2019

Remove scripts for components that were not detected in output buffer #3705

Merged

3 tasks

schlessera mentioned this issue Nov 14, 2019

Turn specs into PHP objects #3730

Closed

swissspidy mentioned this issue Jan 28, 2020

Improve amp plugin compat GoogleForCreators/web-stories-wp#140

Merged

amedina removed this from the v2.0 milestone Mar 31, 2020

westonruter mentioned this issue Apr 11, 2020

Refactor validator specification integration #4566

Closed

amedina added the P2 Low priority label May 14, 2020

kmyram added the WS:Perf Work stream for Metrics, Performance and Optimizer label Aug 5, 2020

westonruter assigned schlessera Dec 16, 2020

westonruter closed this as completed Jun 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit PHP representation of AMP validator spec #2769

Revisit PHP representation of AMP validator spec #2769

westonruter commented Jul 8, 2019

fstanis commented Jul 8, 2019

schlessera commented Nov 14, 2019

westonruter commented Jun 8, 2021

Revisit PHP representation of AMP validator spec #2769

Revisit PHP representation of AMP validator spec #2769

Comments

westonruter commented Jul 8, 2019

fstanis commented Jul 8, 2019

schlessera commented Nov 14, 2019

westonruter commented Jun 8, 2021