-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards more optimal XMath (compacting XMDual) #1309
Conversation
902c10e
to
725c05a
Compare
t/math/compact_dual.xml
Outdated
<resource src="LaTeXML.css" type="text/css"/> | ||
<resource src="ltx-article.css" type="text/css"/> | ||
<para xml:id="p1"> | ||
<p><Math mode="inline" tex="\@CSYMBOL{power}x2+\@CSYMBOL{power}y3" text="power@(x, 2) + power@(y, 3)" xml:id="p1.m1"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, an unexpected aside, it looks like \@APPLY
and \DUAL
don't have a reversion back into the tex
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learning more as I go along, the tex
attribute is dedicated to the presentation TeX, which I've now obeyed by passing the hide_content_reversion
option to \DUAL
. Added a commit.
Feels a little crutchy, should be automatable in the long run... Similarly the content_tex
attribute needs thoughtful reversions in the dual content branch, what a TeX.pool comment referred to as:
# NOTE: work through this systematically!
and is currently not ideal in my latest XML file. It also reminded me of the old discussion around having two annotations for the source TeX of a formula ( #432 ), just linking it in case someone's picking up breadcrumbs later on. Not something I want to jump into right now, but I ended up noticing... The improved XMDual pruning is the only focus here, sorry for the diversion.
I also added an example that does: \def\degtwo{\dual@atom{2}{\prime\prime}}
\def\derive2#1{
\dual@infix{derivative-implicit-variable}{^}{#1}{\degtwo}} The Generating the compact XMath: <XMApp>
<XMTok meaning="derivative-implicit-variable" role="SUPERSCRIPTOP" scriptpos="post1"/>
<XMTok font="italic" role="UNKNOWN" xml:id="p2.m1.1">f</XMTok>
<XMDual xml:id="p2.m1.2">
<XMTok meaning="2"/>
<XMWrap>
<XMTok fontsize="70%" name="prime" role="SUPOP">′</XMTok>
<XMTok fontsize="70%" name="prime" role="SUPOP">′</XMTok>
</XMWrap>
</XMDual>
</XMApp> Note how the code leaves the inner dual as-is, as the trees diverge - single token meaning for two tokens of presentation. |
ae225b3
to
ba0f91a
Compare
push @new_args, $p_arg; | ||
next; } # content-refs-pres, OK | ||
my $p_idref = $p_arg->getAttribute('idref'); | ||
if ($p_idref && ($p_idref eq ($c_arg->getAttribute('xml:id') || ''))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, if-elsif implied...
lib/LaTeXML/Core/Document.pm
Outdated
# 1) we saw such a difference beforehand, or | ||
# 2) the tree is too complex - give up on compacting and return. | ||
# we only handle two XMToks differing for now. | ||
if ($single_duality || ($self->getNodeQName($c_arg) ne 'ltx:XMTok') || $self->getNodeQName($p_arg) ne 'ltx:XMTok') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does parg need to be a token? (so long as it's consistently id/ref'd)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just a starting point. And I am not sure if passing the meaning attribute to other types of nodes wouldn't confuse the post-processor, so I'd like some examples... E.g. XMArray, XMRow, XMCell, or even another XMDual nested underneath could cause some hiccups. But maybe they're all workable already, haven't tried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need examples; I need counter-examples!
I think you're looking at it a bit backwards; The attributes inform post-processor, if that makes it confused, that's the post-processor's problem. Or alternatively, assume the attributes are there for a reason, possibly bad, but they don't need to justify themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll transfer to arbitrary nodes then, scary as that seems... Just landed a commit that removes this guard, though I still haven't found a good example of what it does. Should probably rebase the a11y branch on this one and see how things change in the showcase...
lib/LaTeXML/Core/Document.pm
Outdated
$n_arg->unbindNode; | ||
$compact_apply->appendChild($n_arg); } | ||
# if the dual has a role/id migrate them to the XMApp | ||
for my $attr_key (qw(role xml:id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lost an lpadding
value in my case. You need to copy more attributes from the XMDual to the new XMApp, not to mention from the old XMApp; probably all attributes (unless already set on the old XMApp). And you'll want to use Document's setAttribute, since it manages ids.
It smells more&more like we need some sort of $doc->mergeAttributes($oldnode,$newnode)
that knows which attributes to ignore, which for bookkeeping (id), which to append (class), and who knows else what crops up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, alright. I could also rename the XMDual to an XMApp, that way no attribute moving needs to happen, I'll just replace its children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you'd still want to copy attributes from the old (presentation) XMApp. (besides renaming nodes w/libxml2 always worries me; there've been problems in the past (and yeah, I know it's used rarely))
With the current approach all the XMRef's are in the content, so you really could just replace the XMDual with the presentation's XMApp (ie. an unwrap), still copying attributes from the XMDual down, but I can see why you might not want to assume the current arrangement is permanent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Since the schema is designed to allow XMRefs in either branch, might as well implement with the allowance in mind... I've added a commit that copies all XMDual attributes onto the new "compact" XMApp, unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I am not sure I can use the Document's setAttribute
, as it reports an id conflict, since the id was already recorded on the now compacted/replaced XMDual. And we don't seem to have a Document removeAttribute
, which removes the previous bookkeeping. Would this cause problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, figured it out, I first had to replace the node to un-document the xml:id, and then I could safely re-record it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but there's a unRecordID
; it's a bit of dancing around, though.
Hmm, setAttribute
doesn't use that when it replaces the id, though.
I made a new branch that combines this dual-compacting PR with the WIP acessibility PR, called |
aadeb92
to
105130d
Compare
<XMApp role="ID" xml:id="S1.Ex1.m1.10"> | ||
<XMTok decl_id="S1.XMD3" name="widehat" role="OVERACCENT">^</XMTok> | ||
<XMTok decl_id="S1.XMD4" font="italic" role="ID" xml:id="S1.Ex1.m1.3">x</XMTok> | ||
</XMApp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brucemiller I rebased to the latest master this morning, and think we should consider merging this dual-cleanup PR. It has been included in the a11y showcase for a week now, all seemed in order, and the declare.xml diff is extra encouraging.
Or let me know what else you'd like me to try and improve, I've moved on from working on this one.
Ah, I see travis failed in the PR with something I wanted to ask, namely the reversion of
Question being, isn't it cleaner to have the Edit: here's the other side of the coin with my new test case:
|
I think making BTW: you've still got a couple of failing test cases (compact_dual and (cough) physics) |
Interestingly, the physics errors pointed out the New Improved way that Laplacian is handled; namely reducing the |
3de2992
to
e5b2a9e
Compare
Nicely spotted Laplacian example! I've updated all test files, so travis should pass now. But indeed, I should also double-check the post-processor can handle it, here is a link to the code diff you mention: |
Ah, I remembered some secret experimental knowledge from times past. There is a bit of hidden code in MathML.pm which checks if an incoming XMApp has a role In other words, I can have a very clean rule where "whenever we compact a dual to transfer a meaning attribute onto an XMApp with no role, give it role ID, and treat it as a decorated symbol/embellished operator". Checked in, and even gets cross-references done for the laplacian, when doing |
Well similar, but not that. It doesn't necessarily have role=ID. But if it has meaning, it should be treated as a csymbol (or potentially a built-in for pragmatic). |
Well, the agreement in the MathML.pm experiment is that all XMApp elements with role=ID can be treated as a csymbol. Should we revise and modify that? Maybe introduce a new role? Or just move away from role entirely and as you say do "XMApp with meaning and no role is csymbol" |
From the content pov, it's not embellished --- it simply is the laplacian; and it's not an ID, it's an operator in this case.
|
More concretely: the point is that if it has meaning, treat it as if it were a token. I suspect the same approach works for OM |
145f6d6
to
4c62944
Compare
Cool! It is slightly worrying that we have many different handlers for very related things but it's a start. I've pushed this, and verified it has good cross-references and an |
Seems to work;
|
…s/decorated symbols
4c62944
to
884f73c
Compare
Thanks, tested and added! |
$self->replaceNode($dual, $compact_apply); | ||
# transfer the attributes after replacing, so that the bookkeeping has been undone | ||
for my $key (keys %transfer_attrs) { | ||
$self->setAttribute($compact_apply, $key, $transfer_attrs{$key}); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should have a matching $self->closeElementAt($compact_apply);
which would run any afterClose thingies (since I just tracked down cases where that didn't happen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't have a test for that, oops! Done.
lib/LaTeXML/Core/Document.pm
Outdated
if (ref $n_arg eq 'ARRAY') { | ||
my ($c_arg, $p_arg) = @$n_arg; | ||
# Transfer all c_arg attributes over, it should be primary? | ||
for my $attr_key (qw(decl_id meaning name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should include omcd
in this list.
The dlmf diffs are kinda scary to read, although the resulting xmath is nice (occasionally surprising). But except for missing |
Thanks for doing the QA work, much appreciated! omcd has been added. |
WooHoo!! Thanks! |
Hi @brucemiller ! This is a standalone PR that can be merged and discussed independently of anything else.
Following up on #1305 , I discovered a nice nook where I can have my cake and eat it too - namely the
pruneXMDuals
routine in Document.pm. It turns out this is the right timing (after math parsing, so all script attachment magic is finished and settled in, and also all XMArg and XMWrap logic is collapsed as appropriate).Having found the hook, so far I have only a single case that I need compacted, and I've added it. It arises from scripts, but also seems to commonly be "overdone" when using the dual form of
DefMath
. Namely:This form, in my eyes, is asking to be compacted where the meaning and role of the leading operator token are merged on the same element, and then the realized argument nodes are copied along. Resulting in a single apply node, with no dual. As mentioned in the other thread, it feels more "optimal" for XMath's philosophy, where we try to keep the markup as fine-grained as possible, only creating duals where the presentation and content trees do not align. In the "semantically annotated scripts" cases, the two trees do align, and the "meaning" of the script can reunite with the script's presentational XMTok.
I had to change a single test case to get this PR running, which I don't fully understand and would appreciate a review + guidance on. Of course, I also added my own new test for using
\power
, which uses duals explicitly directly via the TeX macros (\DUAL
and friends), but pleasingly does not produce any XMDual elements, as they can be nicely compacted down to an apply.