Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TeX content in message list #1335

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rajveermalviya
Copy link
Member

Fixes #46

@gnprice
Copy link
Member

gnprice commented Feb 6, 2025

Thanks for posting this! This structure looks promising, and I see there are some TODOs highlighting bits of CSS that may be more challenging.

Which area or two do you think are likely to be the toughest to handle?

Comment on lines +1124 to +1133
case 'sizing':
case 'fontsize-ensurer':
// .sizing,
// .fontsize-ensurer { ... }
if (index + 2 < spanClass.length) {
final resetSizeClass = spanClasses[index + 1];
final sizeClass = spanClasses[index + 2];

final resetSizeClassSuffix = RegExp(r'^reset-size(\d\d?)$').firstMatch(resetSizeClass)?.group(1);
final sizeClassSuffix = RegExp(r'^size(\d\d?)$').firstMatch(sizeClass)?.group(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably one area where we can do it more simply than it's expressed in the CSS — I believe the point of these classes is to take it to the size named in the sizeNN class. The CSS rules are more complicated because they have to express it in terms relative to the enclosing size, in order to use CSS's em concept, but we can instead go straight to the desired size (and just scale it relative to the overall font size that prevails outside of the whole math node).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also this sort of complex logic should ultimately go in the parsing phase, as you might already realize. Putting it here in the widget build phase is perfectly sensible at this prototype stage of the work, though — that's exactly what I did for content parsing as a whole in the very early stage of prototyping this app.)

Comment on lines +1041 to +1047
case 'vlist-t2':
// .vlist-t2 { ... }
break; // TODO

case 'vlist-s':
// .vlist-s { ... }
break; // TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two are another case where our implementation can be simpler than the CSS ­— I think we can probably just ignore both of these.

From the source:

    .vlist-t2 {
        margin-right: -2px;
    }

    .vlist-s {
        // This cell solves Safari rendering problems. It has text content, so
        // its baseline is used for the table. A very small font avoids line-box
        // issues; absolute units prevent user font-size overrides from breaking
        // rendering. Safari refuses to make the box zero-width, so we give it
        // a known width and compensate with negative right margin on the
        // inline-table. To prevent the "width: min-content" Chrome workaround
        // from shrinking this box, we also set min-width.
        display: table-cell;
        vertical-align: bottom;
        font-size: 1px;
        width: 2px;
        min-width: 2px;
    }

So vlist-s is there to work around quirks in Safari. And then vlist-t2 is apparently there to work around a quirk introduced by the vlist-s workarounds — it's that negative right margin mentioned in the comment. (In at least your \sqrt x example, that class appears together with vlist-t, i.e. on the table.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We'd want to have the parser validate that it sees vlist-t2 only alongside vlist-t, and complain if it sees it somewhere else, to confirm that understanding.)

Comment on lines +1037 to +1039
case 'vlist':
// .vlist { display: table-cell; ... }
break; // TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought that I have looking at the styles for this one:

    .vlist {
        display: table-cell;
        vertical-align: bottom;
        position: relative;

        > span {
            display: block;
            height: 0;
            position: relative;

            > span {
                display: inline-block;
            }

            > .pstrut {
                overflow: hidden;
                width: 0;
            }
        }
    }

is that those styles look like they're written with an assumption that the children and grandchildren of a .vlist element will have a fairly particular structure: any child span is given a particular role, and any grandchild span is given a different particular role.

So I think there may be quite a bit of structure in the subtrees that KaTeX generates that use this vlist class. For example, again just looking at your \sqrt x example, it might be:

  • each child of a .vlist has two children,
  • one of which is a .pstrut with a height but no other styles and no children
  • (and the other of which then has more-or-less arbitrary recursive structure).

If that pattern holds, or something like it, then that could substantially simplify thinking through what this CSS does and how to apply it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so here's the one place in KaTeX's code that makes a .vlist (or a .vlist-t etc):
https://github.com/KaTeX/KaTeX/blob/aada26a3700df5205961d9ca7bf23824c771c813/src/buildCommon.js#L589-L623

That code has been very stable; git log -L shows it hasn't been touched substantively since KaTeX/KaTeX@56cfc7c in 2018, and hasn't been touched at all (even for unrelated refactorings) since 2021. Which fits with KaTeX in general, really — it's been very stable for years. So I think we can be fine with relying on specifics of the patterns of what it does.

From that code:

  • A .vlist-t always has either one or two children, both of them span.vlist-r.
  • A .vlist-r always has a first child which is span.vlist, and possibly a second child which is span.vlist-s; no others.
  • A .vlist-s is highly constrained. (Cf. Support TeX content in message list #1335 (comment) above.)
  • A .vlist either has no children (the depthStrut in the linked code), or its children are the realChildren array that comes from the code just above the linked code.
  • Each child of a .vlist comes from this line:
    const childWrap = makeSpan(classes, [pstrut, elem], undefined, style);
    (with some styles added just after that line).
    • That first child is a span.pstrut with a height but no other styles and no children, just as I guessed above.
    • The other child seems indeed more or less arbitrary; it gets passed to this function as an HtmlDomNode.

So I think we can have our parser expect that structure when it sees a .vlist-t, and expect to never see any of the vlist-* classes outside that structure. And then I think the widgets code basically turns it into a Column.


For further background, here's the commit that introduced most of that complexity, including the use of table layout:
KaTeX/KaTeX@56cfc7c
I haven't read to understand all the details in what's going on in that commit, and I think we don't need to. The key takeaway, apart from what one gets by walking through the code to produce the structure I listed above, is:

The point of all this "vlist" stuff is to make a vertical list. And most of the complexity in the CSS and HTML structure is there in order to cancel out complexity in how CSS wants to do the layout.

So if we parse the output with that in mind, we should be able to pick out the simpler thing it's ultimately trying to express, and implement that directly without having to go into all the details neither of the complex CSS behavior it's working around, nor of the complex tricks it gets up to in order to work around those complexities.

That's the strategy I'm hoping will hold for the other relatively complex areas of KaTeX's HTML and CSS as well. For SVGs we'll resort to an SVG implementation, but I think this strategy may be able to cover everything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TeX in the message list
2 participants