Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: static analysis #767

Closed
wants to merge 6 commits into from
Closed

feat: static analysis #767

wants to merge 6 commits into from

Conversation

jg-rp
Copy link
Contributor

@jg-rp jg-rp commented Nov 14, 2024

This PR implements template static analysis and reports template variable usage.

Example

import { Liquid, analyze } from "liquidjs";

const engine = new Liquid();
const template = engine.parse('{% if a %}{{ b }}{% endif %}');
const analysis = analyze(template);

console.log(Object.keys(analysis.variables))
// ['a', 'b']

For analysis to be successful, everything implementing the Template interface must implement the optional (optional for backwards compatibility) node() method, which returns a StaticNode object. analyze() then uses those nodes to traverse the template and gather information about variables.

If a custom tag is found that does not implement node(), a StaticAnalysisError is thrown.

The result of analyze() is an object with variables, globals and locals keys. All of which are objects mapping variable (or path) names to an array of Variable instances, one entry for each occurrence of the variable.

  • variables is all variables, whether they are in scope or not. Including references to names such as forloop from the for tag.
  • globals are variables that are not in scope. These could be "global" variables that are expected to be provided by the application developer, or possible mistakes from the template author.
  • locals are template variables that are added to the template local scope using tags like assign, capture or increment.

Instances of Variable have row, col and file properties, plus segments, which is an array of strings and numbers that make up the path to the variable.

TODO

  • more tests
  • documentation

@coveralls
Copy link

Pull Request Test Coverage Report for Build 11842329528

Details

  • 111 of 208 (53.37%) changed or added relevant lines in 27 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-3.6%) to 96.352%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/tags/block.ts 0 1 0.0%
src/tags/break.ts 0 1 0.0%
src/tags/capture.ts 0 1 0.0%
src/tags/comment.ts 0 1 0.0%
src/tags/continue.ts 0 1 0.0%
src/tags/decrement.ts 0 1 0.0%
src/tags/if.ts 4 5 80.0%
src/tags/increment.ts 0 1 0.0%
src/tags/inline-comment.ts 0 1 0.0%
src/tags/liquid.ts 0 1 0.0%
Totals Coverage Status
Change from base Build 11800996948: -3.6%
Covered Lines: 2644
Relevant Lines: 2741

💛 - Coveralls

@jg-rp
Copy link
Contributor Author

jg-rp commented Nov 14, 2024

Hi @harttle. Before I go any further with this, is this something you want include in LiquidJS? and do you think the approach is OK?


return {
token: this.token,
values,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is it possible to get values by recursively iterate .token and .children? if so, can we remove exposed values on each tag and only retrieve it during analyze?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make analyze() "aware" of the standard tags, and have it "know" what properties might contain values (those properties vary quite a bit). But that would not generalize to all possible custom tags (including custom versions of standard tags), so we'd still need a way for unknown tags to tell us what values they manage.

And if we took a hybrid approach, where our recursive analysis routine reads some properties directly and relies on Node.values for others, whenever someone wants to traverse the template for their own purposes, they too would need to include that knowledge of the built in tags.

Copy link
Owner

@harttle harttle Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with hybrid approach would create more confusion. By values they manage, I guess you mean like arguments for a tag. It's not technically being children so we may want them be exposed by another property other than children. I'm not sure whether it's a good idea to expose another prop named arguments, is this sufficient? As in my understanding, a template consists of name, args and children.

The idea behind it (not explicitly design for variable analyze purpose, i.e. expose a values directly) is, it'll be more useful if node intend to expose an AST structure, which can be used more generally apart from analyzing variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, values is essentially the tag's arguments. There might be one or two cases where an argument is not a value (as in an instance of the Value class or ValueToken type), but the distinction is not really important. We could rename values to arguments.

I hadn't intended nodes returned from Template.node() (I should rename that) to form an AST. In my head it is more of an auxiliary tree, specifically designed to carry template information useful for static analysis in the absence of an AST and without introducing breaking changes.

token: this.token,
values, // Values from this.hash and this.file
children: [],
blockScope, // Keys from this.hash and withVar
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question for blockScope and templateScope, as they seem to be transient variables only needed for analyze, better make them computed instead of implemented in each tag.

But this issue is not big deal, I can help with it latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example for where blockScope is necessary, I often use a non-standard {% with %} tag, inspired by Django's template engine. It aliases and effectively caches long variable/property paths and keeps them in scope for the duration of its block.

For static analysis to work here, the tag must populate blockScope with those aliases, and in a way that informs the caller that these names go out of scope after children have been visited.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, reasonable. As we'll need blockScope and templateScope, I don't think another values is not acceptable. Please treat my above comments as questions/ideas, I think what you're doing currently is also OK.

@harttle
Copy link
Owner

harttle commented Nov 14, 2024

Thank you @jg-rp for the implementation. It's a good to have feature, there's multiple issues for accessing parsed template / iterating the parsed AST / get all output variables. The underlying feature request is to have a consistent structure (like the one returned by node()) to iterate on.

Only concern is hope it won't increase bundle size for browsers significantly.

const analysis = analyze(template)

const bc = new Variable(['b', 'c'], { row: 1, col: 6, file: undefined })
const a = new Variable(['a', bc], { row: 1, col: 4, file: undefined })
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this structure general enogth to represent nesting? Consider this one:

d[a[b.c]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some more test cases with deeper nesting.

expect(analysis).toStrictEqual({
variables: { a: [new Variable(['a'], { row: 1, col: 26, file: undefined })] },
globals: { },
locals: { a: [new Variable(['a'], { row: 1, col: 1, file: undefined })] }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess locals is for variables with a local scope, they can be local only (not depend on global variables), and also can be derived from global variables. In general people also want a list of access to their global variables. Like

{% render 'product' with featured_product as product %}
// in "product.liquid"
{{product.foo}}

What we actually care about is featured_product.foo. How do you think for this scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to think about that. A similar situation could arise from a simple {% assign %} tag.

{% assign product = featured_product %}
{{ product.foo }}

or even

{% assign thing = featured_product.foo %}
{{ thing.bar }}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subtle difference: Assign can be used with filters which is considered as forming new values, while ‘as’ is just alias.
But I think it’s OK to not support such details, especially after we introduced ‘render’ to deprecate ’include’. That means template authors already have a chance to declare them explicitly.

@jg-rp
Copy link
Contributor Author

jg-rp commented Nov 17, 2024

Closing in favour of #770.

@jg-rp jg-rp closed this Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants