Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards more optimal XMath (compacting XMDual) #1309

Merged
merged 15 commits into from
Aug 6, 2020

Conversation

dginev
Copy link
Collaborator

@dginev dginev commented Jul 13, 2020

Hi @brucemiller ! This is a standalone PR that can be merged and discussed independently of anything else.

Following up on #1305 , I discovered a nice nook where I can have my cake and eat it too - namely the pruneXMDuals routine in Document.pm. It turns out this is the right timing (after math parsing, so all script attachment magic is finished and settled in, and also all XMArg and XMWrap logic is collapsed as appropriate).

Having found the hook, so far I have only a single case that I need compacted, and I've added it. It arises from scripts, but also seems to commonly be "overdone" when using the dual form of DefMath. Namely:

<XMDual>
<XMApp>
  <XMTok meaning="real-meaning"/>
  <XMRef idref='arg1'/>
  <XMRef idref='arg2'/>
  ...
</XMApp>
<XMApp> <!-- no wrap, because we're looking at it after math parsing, and it parsed into an apply -->
  <XMTok role="some-role">maybe some pres content</XMTok>
  <XM(*) xml:id='arg1'>...</XM(*)>
  <XM(*) xml:id='arg2'>...</XM(*)>
  ...
</XMApp>
</XMDual>

This form, in my eyes, is asking to be compacted where the meaning and role of the leading operator token are merged on the same element, and then the realized argument nodes are copied along. Resulting in a single apply node, with no dual. As mentioned in the other thread, it feels more "optimal" for XMath's philosophy, where we try to keep the markup as fine-grained as possible, only creating duals where the presentation and content trees do not align. In the "semantically annotated scripts" cases, the two trees do align, and the "meaning" of the script can reunite with the script's presentational XMTok.

I had to change a single test case to get this PR running, which I don't fully understand and would appreciate a review + guidance on. Of course, I also added my own new test for using \power, which uses duals explicitly directly via the TeX macros (\DUAL and friends), but pleasingly does not produce any XMDual elements, as they can be nicely compacted down to an apply.

@dginev dginev force-pushed the prune-bigger-duals branch from 902c10e to 725c05a Compare July 13, 2020 19:56
<resource src="LaTeXML.css" type="text/css"/>
<resource src="ltx-article.css" type="text/css"/>
<para xml:id="p1">
<p><Math mode="inline" tex="\@CSYMBOL{power}x2+\@CSYMBOL{power}y3" text="power@(x, 2) + power@(y, 3)" xml:id="p1.m1">
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, an unexpected aside, it looks like \@APPLY and \DUAL don't have a reversion back into the tex attribute.

Copy link
Collaborator Author

@dginev dginev Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Learning more as I go along, the tex attribute is dedicated to the presentation TeX, which I've now obeyed by passing the hide_content_reversion option to \DUAL. Added a commit.

Feels a little crutchy, should be automatable in the long run... Similarly the content_tex attribute needs thoughtful reversions in the dual content branch, what a TeX.pool comment referred to as:

# NOTE: work through this systematically!

and is currently not ideal in my latest XML file. It also reminded me of the old discussion around having two annotations for the source TeX of a formula ( #432 ), just linking it in case someone's picking up breadcrumbs later on. Not something I want to jump into right now, but I ended up noticing... The improved XMDual pruning is the only focus here, sorry for the diversion.

@dginev
Copy link
Collaborator Author

dginev commented Jul 13, 2020

I also added an example that does:

\def\degtwo{\dual@atom{2}{\prime\prime}}
\def\derive2#1{
  \dual@infix{derivative-implicit-variable}{^}{#1}{\degtwo}}

The \dual@atom and \dual@infix macros are conveniences I made just for that one test file, based entirely on the low-level command sequences provided by TeX.pool.

Generating the compact XMath:

<XMApp>
  <XMTok meaning="derivative-implicit-variable" role="SUPERSCRIPTOP" scriptpos="post1"/>
  <XMTok font="italic" role="UNKNOWN" xml:id="p2.m1.1">f</XMTok>
  <XMDual xml:id="p2.m1.2">
    <XMTok meaning="2"/>
    <XMWrap>
      <XMTok fontsize="70%" name="prime" role="SUPOP">′</XMTok>
      <XMTok fontsize="70%" name="prime" role="SUPOP">′</XMTok>
    </XMWrap>
  </XMDual>
</XMApp>

Note how the code leaves the inner dual as-is, as the trees diverge - single token meaning for two tokens of presentation.

@dginev dginev force-pushed the prune-bigger-duals branch from ae225b3 to ba0f91a Compare July 14, 2020 20:42
push @new_args, $p_arg;
next; } # content-refs-pres, OK
my $p_idref = $p_arg->getAttribute('idref');
if ($p_idref && ($p_idref eq ($c_arg->getAttribute('xml:id') || ''))) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, if-elsif implied...

# 1) we saw such a difference beforehand, or
# 2) the tree is too complex - give up on compacting and return.
# we only handle two XMToks differing for now.
if ($single_duality || ($self->getNodeQName($c_arg) ne 'ltx:XMTok') || $self->getNodeQName($p_arg) ne 'ltx:XMTok') {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does parg need to be a token? (so long as it's consistently id/ref'd)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a starting point. And I am not sure if passing the meaning attribute to other types of nodes wouldn't confuse the post-processor, so I'd like some examples... E.g. XMArray, XMRow, XMCell, or even another XMDual nested underneath could cause some hiccups. But maybe they're all workable already, haven't tried.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need examples; I need counter-examples!

I think you're looking at it a bit backwards; The attributes inform post-processor, if that makes it confused, that's the post-processor's problem. Or alternatively, assume the attributes are there for a reason, possibly bad, but they don't need to justify themselves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll transfer to arbitrary nodes then, scary as that seems... Just landed a commit that removes this guard, though I still haven't found a good example of what it does. Should probably rebase the a11y branch on this one and see how things change in the showcase...

$n_arg->unbindNode;
$compact_apply->appendChild($n_arg); }
# if the dual has a role/id migrate them to the XMApp
for my $attr_key (qw(role xml:id)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lost an lpadding value in my case. You need to copy more attributes from the XMDual to the new XMApp, not to mention from the old XMApp; probably all attributes (unless already set on the old XMApp). And you'll want to use Document's setAttribute, since it manages ids.

It smells more&more like we need some sort of $doc->mergeAttributes($oldnode,$newnode) that knows which attributes to ignore, which for bookkeeping (id), which to append (class), and who knows else what crops up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, alright. I could also rename the XMDual to an XMApp, that way no attribute moving needs to happen, I'll just replace its children.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you'd still want to copy attributes from the old (presentation) XMApp. (besides renaming nodes w/libxml2 always worries me; there've been problems in the past (and yeah, I know it's used rarely))

With the current approach all the XMRef's are in the content, so you really could just replace the XMDual with the presentation's XMApp (ie. an unwrap), still copying attributes from the XMDual down, but I can see why you might not want to assume the current arrangement is permanent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Since the schema is designed to allow XMRefs in either branch, might as well implement with the allowance in mind... I've added a commit that copies all XMDual attributes onto the new "compact" XMApp, unconditionally.

Copy link
Collaborator Author

@dginev dginev Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I am not sure I can use the Document's setAttribute, as it reports an id conflict, since the id was already recorded on the now compacted/replaced XMDual. And we don't seem to have a Document removeAttribute, which removes the previous bookkeeping. Would this cause problems?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, figured it out, I first had to replace the node to un-document the xml:id, and then I could safely re-record it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but there's a unRecordID; it's a bit of dancing around, though.

Hmm, setAttribute doesn't use that when it replaces the id, though.

@dginev
Copy link
Collaborator Author

dginev commented Jul 16, 2020

I made a new branch that combines this dual-compacting PR with the WIP acessibility PR, called dginev/a11y-deploy and am now using it for the showcase. All seems in order so far, still examining it.

@dginev dginev force-pushed the prune-bigger-duals branch from aadeb92 to 105130d Compare August 5, 2020 13:52
<XMApp role="ID" xml:id="S1.Ex1.m1.10">
<XMTok decl_id="S1.XMD3" name="widehat" role="OVERACCENT">^</XMTok>
<XMTok decl_id="S1.XMD4" font="italic" role="ID" xml:id="S1.Ex1.m1.3">x</XMTok>
</XMApp>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brucemiller I rebased to the latest master this morning, and think we should consider merging this dual-cleanup PR. It has been included in the a11y showcase for a week now, all seemed in order, and the declare.xml diff is extra encouraging.

Or let me know what else you'd like me to try and improve, I've moved on from working on this one.

@dginev
Copy link
Collaborator Author

dginev commented Aug 5, 2020

Ah, I see travis failed in the PR with something I wanted to ask, namely the reversion of \@CSYMBOL in the tex attribute. The failure is:

# Difference at line 358 for t/complex/physics

#       got : '<Math mode="inline" tex="\rightarrow a" text="absent rightarrow absolutevalue@(a)" xml:id="S1.T1.m19">'

#  expected : '<Math mode="inline" tex="\rightarrow\@CSYMBOL{absolutevalue}a" text="absent rightarrow absolutevalue@(a)" xml:id="S1.T1.m19">'

Question being, isn't it cleaner to have the \@csymbol reversion make it "disappear" from the tex attribute. Trying not to reopen the big can of worms about semantics vs non-semantic tex attributes for math, but this is a curious case where the token deposits no visible ink, rather it just feeds the content branch of an XMDual.

Edit: here's the other side of the coin with my new test case:

# Difference at line 8 for t/math/compact_dual

#       got : '    <p><Math content-tex="\@CSYMBOL{power}x2+\@CSYMBOL{power}y3" mode="inline" tex="x^{2}+y^{3}" text="power@(x, 2) + power@(y, 3)" xml:id="p1.m1">'

#  expected : '    <p><Math content-tex="x2+y3" mode="inline" tex="x^{2}+y^{3}" text="power@(x, 2) + power@(y, 3)" xml:id="p1.m1">'

@brucemiller
Copy link
Owner

I think making \@csymbol disappear is the wrong approach; I just started experimenting with letting \DUAL have a reversion keyval option. That seems like the right level to handle the issue, although physics.sty is kinda scary how to create that reversion.

BTW: you've still got a couple of failing test cases (compact_dual and (cough) physics)

@brucemiller
Copy link
Owner

Interestingly, the physics errors pointed out the New Improved way that Laplacian is handled; namely reducing the \nabla^2 to an XMApp tree, but with meaning=laplacian. Arguably that's correct and good, but the cmml & om postprocessorswill need to be updated to recognize such constructs.

@dginev dginev force-pushed the prune-bigger-duals branch from 3de2992 to e5b2a9e Compare August 5, 2020 17:05
@dginev
Copy link
Collaborator Author

dginev commented Aug 5, 2020

Nicely spotted Laplacian example! I've updated all test files, so travis should pass now. But indeed, I should also double-check the post-processor can handle it, here is a link to the code diff you mention:

e5b2a9e#diff-c95bec3fd2f012f6a8dc521538db2fceR1522-R1529

@dginev
Copy link
Collaborator Author

dginev commented Aug 5, 2020

Ah, I remembered some secret experimental knowledge from times past. There is a bit of hidden code in MathML.pm which checks if an incoming XMApp has a role ID, in which case it treats it as a decorated symbol via cmml_decoratedSymbol.

In other words, I can have a very clean rule where "whenever we compact a dual to transfer a meaning attribute onto an XMApp with no role, give it role ID, and treat it as a decorated symbol/embellished operator".

Checked in, and even gets cross-references done for the laplacian, when doing --pmml --cmml.

@brucemiller
Copy link
Owner

Well similar, but not that. It doesn't necessarily have role=ID. But if it has meaning, it should be treated as a csymbol (or potentially a built-in for pragmatic).

@dginev
Copy link
Collaborator Author

dginev commented Aug 5, 2020

Well, the agreement in the MathML.pm experiment is that all XMApp elements with role=ID can be treated as a csymbol. Should we revise and modify that? Maybe introduce a new role? Or just move away from role entirely and as you say do "XMApp with meaning and no role is csymbol"

Edit: link to the discussed code

@brucemiller
Copy link
Owner

From the content pov, it's not embellished --- it simply is the laplacian; and it's not an ID, it's an operator in this case.
You'll want to add this before the embellished operator clause in cmml_internal

    if(my $meaning = $node->getAttribute('meaning')){
      return &{ lookupContent('Token', $node->getAttribute('role'), $meaning) }($node); }
    elsif (($node->getAttribute('role') || '') eq 'ID') {

@brucemiller
Copy link
Owner

More concretely: the point is that if it has meaning, treat it as if it were a token. I suspect the same approach works for OM

@dginev dginev force-pushed the prune-bigger-duals branch from 145f6d6 to 4c62944 Compare August 5, 2020 19:47
@dginev
Copy link
Collaborator Author

dginev commented Aug 5, 2020

Cool! It is slightly worrying that we have many different handlers for very related things but it's a start. I've pushed this, and verified it has good cross-references and an m:laplacian content element (pragmatic!)

@brucemiller
Copy link
Owner

brucemiller commented Aug 5, 2020

Seems to work; om_expr_aux in OpenMath.pm

  elsif ($tag eq 'ltx:XMApp') {
    if(my $meaning = $node->getAttribute('meaning')){
      my $sub = lookupConverter('Token', $node->getAttribute('role'), $meaning);
      return &$sub($node); }
   ....

@dginev dginev force-pushed the prune-bigger-duals branch from 4c62944 to 884f73c Compare August 5, 2020 19:56
@dginev
Copy link
Collaborator Author

dginev commented Aug 5, 2020

Thanks, tested and added!

$self->replaceNode($dual, $compact_apply);
# transfer the attributes after replacing, so that the bookkeeping has been undone
for my $key (keys %transfer_attrs) {
$self->setAttribute($compact_apply, $key, $transfer_attrs{$key}); }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should have a matching $self->closeElementAt($compact_apply); which would run any afterClose thingies (since I just tracked down cases where that didn't happen).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have a test for that, oops! Done.

if (ref $n_arg eq 'ARRAY') {
my ($c_arg, $p_arg) = @$n_arg;
# Transfer all c_arg attributes over, it should be primary?
for my $attr_key (qw(decl_id meaning name)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include omcd in this list.

@brucemiller
Copy link
Owner

The dlmf diffs are kinda scary to read, although the resulting xmath is nice (occasionally surprising). But except for missing omcd it looks good.

@dginev
Copy link
Collaborator Author

dginev commented Aug 6, 2020

Thanks for doing the QA work, much appreciated! omcd has been added.

@brucemiller
Copy link
Owner

WooHoo!! Thanks!

@brucemiller brucemiller merged commit effdc2d into brucemiller:master Aug 6, 2020
@brucemiller brucemiller deleted the prune-bigger-duals branch August 6, 2020 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants