Towards more optimal XMath (compacting XMDual) #1309

dginev · 2020-07-13T19:51:30Z

Hi @brucemiller ! This is a standalone PR that can be merged and discussed independently of anything else.

Following up on #1305 , I discovered a nice nook where I can have my cake and eat it too - namely the pruneXMDuals routine in Document.pm. It turns out this is the right timing (after math parsing, so all script attachment magic is finished and settled in, and also all XMArg and XMWrap logic is collapsed as appropriate).

Having found the hook, so far I have only a single case that I need compacted, and I've added it. It arises from scripts, but also seems to commonly be "overdone" when using the dual form of DefMath. Namely:

<XMDual>
<XMApp>
  <XMTok meaning="real-meaning"/>
  <XMRef idref='arg1'/>
  <XMRef idref='arg2'/>
  ...
</XMApp>
<XMApp> <!-- no wrap, because we're looking at it after math parsing, and it parsed into an apply -->
  <XMTok role="some-role">maybe some pres content</XMTok>
  <XM(*) xml:id='arg1'>...</XM(*)>
  <XM(*) xml:id='arg2'>...</XM(*)>
  ...
</XMApp>
</XMDual>

This form, in my eyes, is asking to be compacted where the meaning and role of the leading operator token are merged on the same element, and then the realized argument nodes are copied along. Resulting in a single apply node, with no dual. As mentioned in the other thread, it feels more "optimal" for XMath's philosophy, where we try to keep the markup as fine-grained as possible, only creating duals where the presentation and content trees do not align. In the "semantically annotated scripts" cases, the two trees do align, and the "meaning" of the script can reunite with the script's presentational XMTok.

I had to change a single test case to get this PR running, which I don't fully understand and would appreciate a review + guidance on. Of course, I also added my own new test for using \power, which uses duals explicitly directly via the TeX macros (\DUAL and friends), but pleasingly does not produce any XMDual elements, as they can be nicely compacted down to an apply.

dginev · 2020-07-13T19:59:36Z

t/math/compact_dual.xml

+  <resource src="LaTeXML.css" type="text/css"/>
+  <resource src="ltx-article.css" type="text/css"/>
+  <para xml:id="p1">
+    <p><Math mode="inline" tex="\@CSYMBOL{power}x2+\@CSYMBOL{power}y3" text="power@(x, 2) + power@(y, 3)" xml:id="p1.m1">


Btw, an unexpected aside, it looks like \@APPLY and \DUAL don't have a reversion back into the tex attribute.

Learning more as I go along, the tex attribute is dedicated to the presentation TeX, which I've now obeyed by passing the hide_content_reversion option to \DUAL. Added a commit.

Feels a little crutchy, should be automatable in the long run... Similarly the content_tex attribute needs thoughtful reversions in the dual content branch, what a TeX.pool comment referred to as:

# NOTE: work through this systematically!

and is currently not ideal in my latest XML file. It also reminded me of the old discussion around having two annotations for the source TeX of a formula ( #432 ), just linking it in case someone's picking up breadcrumbs later on. Not something I want to jump into right now, but I ended up noticing... The improved XMDual pruning is the only focus here, sorry for the diversion.

dginev · 2020-07-13T23:17:36Z

I also added an example that does:

\def\degtwo{\dual@atom{2}{\prime\prime}}
\def\derive2#1{
  \dual@infix{derivative-implicit-variable}{^}{#1}{\degtwo}}

The \dual@atom and \dual@infix macros are conveniences I made just for that one test file, based entirely on the low-level command sequences provided by TeX.pool.

Generating the compact XMath:

<XMApp>
  <XMTok meaning="derivative-implicit-variable" role="SUPERSCRIPTOP" scriptpos="post1"/>
  <XMTok font="italic" role="UNKNOWN" xml:id="p2.m1.1">f</XMTok>
  <XMDual xml:id="p2.m1.2">
    <XMTok meaning="2"/>
    <XMWrap>
      <XMTok fontsize="70%" name="prime" role="SUPOP">′</XMTok>
      <XMTok fontsize="70%" name="prime" role="SUPOP">′</XMTok>
    </XMWrap>
  </XMDual>
</XMApp>

Note how the code leaves the inner dual as-is, as the trees diverge - single token meaning for two tokens of presentation.

brucemiller · 2020-07-16T12:46:53Z

lib/LaTeXML/Core/Document.pm

+      push @new_args, $p_arg;
+      next; }    # content-refs-pres, OK
+    my $p_idref = $p_arg->getAttribute('idref');
+    if ($p_idref && ($p_idref eq ($c_arg->getAttribute('xml:id') || ''))) {


ah, if-elsif implied...

brucemiller · 2020-07-16T12:49:55Z

lib/LaTeXML/Core/Document.pm

+    # 1) we saw such a difference beforehand, or
+    # 2) the tree is too complex - give up on compacting and return.
+    # we only handle two XMToks differing for now.
+    if ($single_duality || ($self->getNodeQName($c_arg) ne 'ltx:XMTok') || $self->getNodeQName($p_arg) ne 'ltx:XMTok') {


why does parg need to be a token? (so long as it's consistently id/ref'd)

It was just a starting point. And I am not sure if passing the meaning attribute to other types of nodes wouldn't confuse the post-processor, so I'd like some examples... E.g. XMArray, XMRow, XMCell, or even another XMDual nested underneath could cause some hiccups. But maybe they're all workable already, haven't tried.

You don't need examples; I need counter-examples!

I think you're looking at it a bit backwards; The attributes inform post-processor, if that makes it confused, that's the post-processor's problem. Or alternatively, assume the attributes are there for a reason, possibly bad, but they don't need to justify themselves.

Alright, I'll transfer to arbitrary nodes then, scary as that seems... Just landed a commit that removes this guard, though I still haven't found a good example of what it does. Should probably rebase the a11y branch on this one and see how things change in the showcase...

brucemiller · 2020-07-16T12:57:06Z

lib/LaTeXML/Core/Document.pm

+    $n_arg->unbindNode;
+    $compact_apply->appendChild($n_arg); }
+  # if the dual has a role/id migrate them to the XMApp
+  for my $attr_key (qw(role xml:id)) {


This lost an lpadding value in my case. You need to copy more attributes from the XMDual to the new XMApp, not to mention from the old XMApp; probably all attributes (unless already set on the old XMApp). And you'll want to use Document's setAttribute, since it manages ids.

It smells more&more like we need some sort of $doc->mergeAttributes($oldnode,$newnode) that knows which attributes to ignore, which for bookkeeping (id), which to append (class), and who knows else what crops up.

Hm, alright. I could also rename the XMDual to an XMApp, that way no attribute moving needs to happen, I'll just replace its children.

No, you'd still want to copy attributes from the old (presentation) XMApp. (besides renaming nodes w/libxml2 always worries me; there've been problems in the past (and yeah, I know it's used rarely))

With the current approach all the XMRef's are in the content, so you really could just replace the XMDual with the presentation's XMApp (ie. an unwrap), still copying attributes from the XMDual down, but I can see why you might not want to assume the current arrangement is permanent.

Indeed. Since the schema is designed to allow XMRefs in either branch, might as well implement with the allowance in mind... I've added a commit that copies all XMDual attributes onto the new "compact" XMApp, unconditionally.

Btw, I am not sure I can use the Document's setAttribute, as it reports an id conflict, since the id was already recorded on the now compacted/replaced XMDual. And we don't seem to have a Document removeAttribute, which removes the previous bookkeeping. Would this cause problems?

Nevermind, figured it out, I first had to replace the node to un-document the xml:id, and then I could safely re-record it.

No, but there's a unRecordID; it's a bit of dancing around, though.

Hmm, setAttribute doesn't use that when it replaces the id, though.

dginev · 2020-07-16T14:44:48Z

I made a new branch that combines this dual-compacting PR with the WIP acessibility PR, called dginev/a11y-deploy and am now using it for the showcase. All seems in order so far, still examining it.

…ing method

…ting XMDual

dginev · 2020-08-05T13:55:11Z

t/math/declare.xml

+                <XMApp role="ID" xml:id="S1.Ex1.m1.10">
+                  <XMTok decl_id="S1.XMD3" name="widehat" role="OVERACCENT">^</XMTok>
+                  <XMTok decl_id="S1.XMD4" font="italic" role="ID" xml:id="S1.Ex1.m1.3">x</XMTok>
+                </XMApp>


@brucemiller I rebased to the latest master this morning, and think we should consider merging this dual-cleanup PR. It has been included in the a11y showcase for a week now, all seemed in order, and the declare.xml diff is extra encouraging.

Or let me know what else you'd like me to try and improve, I've moved on from working on this one.

dginev · 2020-08-05T14:24:17Z

Ah, I see travis failed in the PR with something I wanted to ask, namely the reversion of \@CSYMBOL in the tex attribute. The failure is:

# Difference at line 358 for t/complex/physics

#       got : '<Math mode="inline" tex="\rightarrow a" text="absent rightarrow absolutevalue@(a)" xml:id="S1.T1.m19">'

#  expected : '<Math mode="inline" tex="\rightarrow\@CSYMBOL{absolutevalue}a" text="absent rightarrow absolutevalue@(a)" xml:id="S1.T1.m19">'

Question being, isn't it cleaner to have the \@csymbol reversion make it "disappear" from the tex attribute. Trying not to reopen the big can of worms about semantics vs non-semantic tex attributes for math, but this is a curious case where the token deposits no visible ink, rather it just feeds the content branch of an XMDual.

Edit: here's the other side of the coin with my new test case:

# Difference at line 8 for t/math/compact_dual

#       got : '    <p><Math content-tex="\@CSYMBOL{power}x2+\@CSYMBOL{power}y3" mode="inline" tex="x^{2}+y^{3}" text="power@(x, 2) + power@(y, 3)" xml:id="p1.m1">'

#  expected : '    <p><Math content-tex="x2+y3" mode="inline" tex="x^{2}+y^{3}" text="power@(x, 2) + power@(y, 3)" xml:id="p1.m1">'

brucemiller · 2020-08-05T16:27:49Z

I think making \@csymbol disappear is the wrong approach; I just started experimenting with letting \DUAL have a reversion keyval option. That seems like the right level to handle the issue, although physics.sty is kinda scary how to create that reversion.

BTW: you've still got a couple of failing test cases (compact_dual and (cough) physics)

brucemiller · 2020-08-05T16:42:39Z

Interestingly, the physics errors pointed out the New Improved way that Laplacian is handled; namely reducing the \nabla^2 to an XMApp tree, but with meaning=laplacian. Arguably that's correct and good, but the cmml & om postprocessorswill need to be updated to recognize such constructs.

dginev · 2020-08-05T17:07:34Z

Nicely spotted Laplacian example! I've updated all test files, so travis should pass now. But indeed, I should also double-check the post-processor can handle it, here is a link to the code diff you mention:

e5b2a9e#diff-c95bec3fd2f012f6a8dc521538db2fceR1522-R1529

dginev · 2020-08-05T18:58:46Z

Ah, I remembered some secret experimental knowledge from times past. There is a bit of hidden code in MathML.pm which checks if an incoming XMApp has a role ID, in which case it treats it as a decorated symbol via cmml_decoratedSymbol.

In other words, I can have a very clean rule where "whenever we compact a dual to transfer a meaning attribute onto an XMApp with no role, give it role ID, and treat it as a decorated symbol/embellished operator".

Checked in, and even gets cross-references done for the laplacian, when doing --pmml --cmml.

brucemiller · 2020-08-05T19:29:22Z

Well similar, but not that. It doesn't necessarily have role=ID. But if it has meaning, it should be treated as a csymbol (or potentially a built-in for pragmatic).

dginev · 2020-08-05T19:34:36Z

Well, the agreement in the MathML.pm experiment is that all XMApp elements with role=ID can be treated as a csymbol. Should we revise and modify that? Maybe introduce a new role? Or just move away from role entirely and as you say do "XMApp with meaning and no role is csymbol"

Edit: link to the discussed code

brucemiller · 2020-08-05T19:37:59Z

From the content pov, it's not embellished --- it simply is the laplacian; and it's not an ID, it's an operator in this case.
You'll want to add this before the embellished operator clause in cmml_internal

    if(my $meaning = $node->getAttribute('meaning')){
      return &{ lookupContent('Token', $node->getAttribute('role'), $meaning) }($node); }
    elsif (($node->getAttribute('role') || '') eq 'ID') {

brucemiller · 2020-08-05T19:46:27Z

More concretely: the point is that if it has meaning, treat it as if it were a token. I suspect the same approach works for OM

dginev · 2020-08-05T19:47:40Z

Cool! It is slightly worrying that we have many different handlers for very related things but it's a start. I've pushed this, and verified it has good cross-references and an m:laplacian content element (pragmatic!)

brucemiller · 2020-08-05T19:50:34Z

Seems to work; om_expr_aux in OpenMath.pm

  elsif ($tag eq 'ltx:XMApp') {
    if(my $meaning = $node->getAttribute('meaning')){
      my $sub = lookupConverter('Token', $node->getAttribute('role'), $meaning);
      return &$sub($node); }
   ....

…s/decorated symbols

dginev · 2020-08-05T19:57:25Z

Thanks, tested and added!

brucemiller · 2020-08-05T23:08:45Z

lib/LaTeXML/Core/Document.pm

+  $self->replaceNode($dual, $compact_apply);
+  # transfer the attributes after replacing, so that the bookkeeping has been undone
+  for my $key (keys %transfer_attrs) {
+    $self->setAttribute($compact_apply, $key, $transfer_attrs{$key}); }


probably should have a matching $self->closeElementAt($compact_apply); which would run any afterClose thingies (since I just tracked down cases where that didn't happen).

Didn't have a test for that, oops! Done.

brucemiller · 2020-08-06T14:45:44Z

lib/LaTeXML/Core/Document.pm

+    if (ref $n_arg eq 'ARRAY') {
+      my ($c_arg, $p_arg) = @$n_arg;
+      # Transfer all c_arg attributes over, it should be primary?
+      for my $attr_key (qw(decl_id meaning name)) {


Should include omcd in this list.

brucemiller · 2020-08-06T14:47:43Z

The dlmf diffs are kinda scary to read, although the resulting xmath is nice (occasionally surprising). But except for missing omcd it looks good.

dginev · 2020-08-06T16:17:54Z

Thanks for doing the QA work, much appreciated! omcd has been added.

brucemiller · 2020-08-06T16:32:24Z

WooHoo!! Thanks!

dginev force-pushed the prune-bigger-duals branch from 902c10e to 725c05a Compare July 13, 2020 19:56

dginev commented Jul 13, 2020

View reviewed changes

dginev force-pushed the prune-bigger-duals branch from ae225b3 to ba0f91a Compare July 14, 2020 20:42

brucemiller reviewed Jul 16, 2020

View reviewed changes

dginev added 10 commits August 5, 2020 09:51

compact XMDuals with mirror XMApp and single XMTok diff

1952125

fix comment

a83c521

add second derivative dual compacting example

2b3e975

follow the presentation path for the tex attributes of compact_dual.xml

eaff09b

@csymbol should revert to nothing for now

1a25cbc

less confusing \dual@script name

73bc03e

copy all attrs of compacted XMDual onto new XMApp

9691b31

transfer id after replacing, and with Document::setAttribute bookkeep…

5504344

…ing method

allow for arbitrary presentation subtree vs 1 content XMTok in compac…

fee7e39

…ting XMDual

rebase needed an update to declare.xml

105130d

dginev force-pushed the prune-bigger-duals branch from aadeb92 to 105130d Compare August 5, 2020 13:52

dginev commented Aug 5, 2020

View reviewed changes

dginev added 2 commits August 5, 2020 13:02

back off from modding \@CSYMBOL reversion

c825335

also simplify duals in complex/physics test

e5b2a9e

dginev force-pushed the prune-bigger-duals branch from 3de2992 to e5b2a9e Compare August 5, 2020 17:05

dginev force-pushed the prune-bigger-duals branch from 145f6d6 to 4c62944 Compare August 5, 2020 19:47

handle XMApp with meaning that are stand-ins for embellished operator…

884f73c

…s/decorated symbols

dginev force-pushed the prune-bigger-duals branch from 4c62944 to 884f73c Compare August 5, 2020 19:56

brucemiller reviewed Aug 5, 2020

View reviewed changes

close compact_apply when done

3c50f9b

brucemiller reviewed Aug 6, 2020

View reviewed changes

include omcd, thanks Bruce

3466712

brucemiller merged commit effdc2d into brucemiller:master Aug 6, 2020

brucemiller deleted the prune-bigger-duals branch August 6, 2020 16:32

dginev mentioned this pull request Aug 20, 2020

prune XMDuals with mirrored single tokens #1327

Merged

Towards more optimal XMath (compacting XMDual) #1309

Towards more optimal XMath (compacting XMDual) #1309

Conversation

dginev commented Jul 13, 2020 • edited Loading

Choose a reason for hiding this comment

dginev Jul 14, 2020 • edited Loading

Choose a reason for hiding this comment

dginev commented Jul 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dginev Jul 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dginev commented Jul 16, 2020 • edited Loading

Choose a reason for hiding this comment

dginev commented Aug 5, 2020 • edited Loading

brucemiller commented Aug 5, 2020

brucemiller commented Aug 5, 2020

dginev commented Aug 5, 2020

dginev commented Aug 5, 2020

brucemiller commented Aug 5, 2020

dginev commented Aug 5, 2020 • edited Loading

brucemiller commented Aug 5, 2020

brucemiller commented Aug 5, 2020

dginev commented Aug 5, 2020

brucemiller commented Aug 5, 2020 • edited Loading

dginev commented Aug 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brucemiller commented Aug 6, 2020

dginev commented Aug 6, 2020

brucemiller commented Aug 6, 2020

dginev commented Jul 13, 2020 •

edited

Loading

dginev Jul 14, 2020 •

edited

Loading

dginev commented Jul 13, 2020 •

edited

Loading

dginev Jul 16, 2020 •

edited

Loading

dginev commented Jul 16, 2020 •

edited

Loading

dginev commented Aug 5, 2020 •

edited

Loading

dginev commented Aug 5, 2020 •

edited

Loading

brucemiller commented Aug 5, 2020 •

edited

Loading

dginev commented Aug 5, 2020 •

edited

Loading