-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Add support for basic structure tree for accessibility. #13171
Conversation
Example of how this works:
For testing, I was hoping to use something that would provide us a dump of the accessibility tree, but nothing quit matched what we want. Chrome's puppeteer has a way to get the tree, but it doesn't handle aria-owns. Firefox's devtools also has a way to dump the tree to JSON, but there's no easy way to expose this to content. Alternatively, I started making a "FauxScreenReader", but it was starting to get rather complicated, so I'll move that to a patch on its own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few comments, based on a very quick look at the code; I can try to review this in more detail later on in case that's helpful.
src/core/evaluator.js
Outdated
args = [args[0].name]; | ||
args = [ | ||
args[0].name, | ||
args[1] && args[1].get ? args[1].get("MCID") : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you actually mean to do a isDict(args[1])
-check here, since this currently looks a bit weird?
src/core/struct_tree.js
Outdated
} | ||
|
||
get role() { | ||
const name = this.dict.has("S") ? this.dict.get("S").name : ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to also validate that the /S-entry actually is a Name
?
src/core/struct_tree.js
Outdated
const obj = this.dict.get("Pg"); | ||
if (obj) { | ||
pageObjId = obj.objId; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general question, here and elsewhere in the patch: Don't you perhaps want to access a /Ref using getRaw
instead, with a following isRef
-check and toString
-call, rather than trusting that a dictionary always have the objId
-property set?
One advantage of that approach, I think, would be fewer data lookups overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better will do.
src/core/struct_tree.js
Outdated
// Find the dictionary for the kid. | ||
let kidDict = null; | ||
if (isRef(kid)) { | ||
kidDict = this.dict.xref.fetchIfRef(kid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit redundant, since you've already checked that it's a /Ref above; hence please change fetchIfRef
-> fetch
.
Furthermore, are we absolutely sure that all Dictionaries have a xref
-property, or would it be generally safer to pass in the XRef-instance when initializing these classes?
web/pdf_page_view.js
Outdated
// The structure tree must be generated after the text layer for the | ||
// aria-owns to work. | ||
this._handleTextLayerRendered = event => { | ||
if (event.pageNumber !== this.pdfPage.pageNumber) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.pdfPage.pageNumber
-> this.id
web/pdf_page_view.js
Outdated
if (this.structTreeLayerFactory && this.textLayer && this.canvas) { | ||
// The structure tree must be generated after the text layer for the | ||
// aria-owns to work. | ||
this._handleTextLayerRendered = event => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name-nit: Perhaps this._onTextLayerRendered
instead?
web/pdf_page_view.js
Outdated
this.canvas.appendChild(treeDom); | ||
}); | ||
}; | ||
this.eventBus._on("textlayerrendered", this._handleTextLayerRendered); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no guarantee that this event listener is ever removed, if the page is e.g. destroyed, hence please add something like the following to the cancelRendering
-method:
if (this._onTextLayerRendered) {
this.eventBus._off("textlayerrendered", this._onTextLayerRendered);
this._onTextLayerRendered = null;
}
@@ -24,7 +24,7 @@ | |||
line-height: 1; | |||
} | |||
|
|||
.textLayer > span { | |||
.textLayer span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change this, it should probably also be changed here since that file is based on this one: https://github.com/mozilla/pdf.js/blob/master/test/text_layer_test.css#L26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, there's a handful of .textLayer > span
-related rules in https://github.com/mozilla/pdf.js/blob/master/web/viewer.css so please check if those also need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I missed those. Technically one or two can be left since the property is inherited, but for consistency it's probably better to just change them all.
src/core/evaluator.js
Outdated
args = [args[0].name]; | ||
args = [ | ||
args[0].name, | ||
args[1] && isDict(args[1]) ? args[1].get("MCID") : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The following would probably suffice here
args[1] && isDict(args[1]) ? args[1].get("MCID") : null, | |
args[1] instanceof Dict ? args[1].get("MCID") : null, |
src/core/evaluator.js
Outdated
type: "beginMarkedContentProps", | ||
id: | ||
isDict(args[1]) && args[1].has("MCID") | ||
? `page${pageObjId}_mcid${args[1].get("MCID")}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the changes suggested above, this would become
? `page${pageObjId}_mcid${args[1].get("MCID")}` | |
? `${this.idFactory.getPageObjId()}_mcid${args[1].get("MCID")}` |
src/core/evaluator.js
Outdated
textContent.items.push({ | ||
type: "beginMarkedContentProps", | ||
id: | ||
isDict(args[1]) && args[1].has("MCID") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to validate that the MCID
-data is of the expected type here?
src/core/obj.js
Outdated
if (ex instanceof MissingDataException) { | ||
throw ex; | ||
} | ||
warn("Unable to structTreeRoot info."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Missing word here?
warn("Unable to structTreeRoot info."); | |
warn("Unable to read structTreeRoot info."); |
src/core/struct_tree.js
Outdated
obj.children = []; | ||
parent.children.push(obj); | ||
if (node.dict.has("Alt")) { | ||
obj.alt = node.dict.get("Alt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the Alt
value be validated, before being used here, to ensure that it's of the expected type?
Also, since I guess that this's a string you probably need wrap to it in stringToPDFString
as well.
src/core/struct_tree.js
Outdated
obj.alt = node.dict.get("Alt"); | ||
} | ||
|
||
for (let i = 0; i < node.kids.length; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Here, and below, for...of
instead?
web/struct_tree_layer_builder.js
Outdated
} | ||
|
||
_setAttributes(structElement, htmlElement) { | ||
if ("alt" in structElement) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and below, can't we compare with ... !== undefined
rather than using in
?
src/core/struct_tree.js
Outdated
} | ||
} | ||
|
||
addNode(dict, map, level) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addNode(dict, map, level) { | |
addNode(dict, map, level = 0) { |
src/core/struct_tree.js
Outdated
*/ | ||
get serializable() { | ||
const MAX_DEPTH = 40; | ||
function nodeToSerializable(node, parent, level) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function nodeToSerializable(node, parent, level) { | |
function nodeToSerializable(node, parent, level = 0) { |
fccd302
to
2ce79b5
Compare
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/8b88b70d7e3875e/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/0caba9cd0fe84fa/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/8b88b70d7e3875e/output.txt Total script time: 24.90 mins
Image differences available at: http://54.67.70.0:8877/8b88b70d7e3875e/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/0caba9cd0fe84fa/output.txt Total script time: 28.61 mins
Image differences available at: http://3.101.106.178:8877/0caba9cd0fe84fa/reftest-analyzer.html#web=eq.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the reference-tests, a number of the text
ones now seem to have something rendered in the top-left corner of the pages. Edit: Oh, it seems that those are actually the "container" <span>
-elements that's added by this patch, for pages with struct-trees; can we please use e.g. a suitable CSS class to hide those special <span>
-elements?
With the text
-tests, and the last nits, fixed this looks good to me; thank you!
src/core/document.js
Outdated
combineTextItems, | ||
sink, | ||
pageObjId: this.ref.toString(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line, since it should be unused now.
src/core/evaluator.js
Outdated
sink, | ||
seenStyles = new Set(), | ||
pageObjId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line, since it should be unused now.
src/core/evaluator.js
Outdated
sink: sinkWrapper, | ||
seenStyles, | ||
pageObjId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line, since it should be unused now.
web/pdf_page_view.js
Outdated
this._onTextLayerRendered = null; | ||
this.pdfPage.getStructTree().then(tree => { | ||
const treeDom = this.structTreeLayer.render(tree); | ||
this.structTree = treeDom; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You're never accessing and/or resetting this property anywhere, is it actually necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need to rebuild the struct tree on re-renders, but I'll fix that in a follow up.
When a PDF is "marked" we now generate a separate DOM that represents the structure tree from the PDF. This DOM is inserted into the <canvas> element and allows screen readers to walk the tree and have more information about headings, images, links, etc. To link the structure tree DOM (which is empty) to the text layer aria-owns is used. This required modifying the text layer creation so that marked items are now tracked.
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @brendandahl received. Current queue size: 0 Live output at: http://54.67.70.0:8877/6f9a1b79b206976/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @brendandahl received. Current queue size: 0 Live output at: http://3.101.106.178:8877/f9c71badc41b1c0/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/6f9a1b79b206976/output.txt Total script time: 24.87 mins
Image differences available at: http://54.67.70.0:8877/6f9a1b79b206976/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/f9c71badc41b1c0/output.txt Total script time: 29.00 mins
Image differences available at: http://3.101.106.178:8877/f9c71badc41b1c0/reftest-analyzer.html#web=eq.log |
Really nice work! I also did a pass over this, but it looks really good to me. /botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/b5777ba4db7ebc1/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 1 Live output at: http://3.101.106.178:8877/2cf94295d99eef4/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/b5777ba4db7ebc1/output.txt Total script time: 22.91 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/2cf94295d99eef4/output.txt Total script time: 26.51 mins
|
When a PDF is "marked" we now generate a separate DOM that represents
the structure tree from the PDF. This DOM is inserted into the
element and allows screen readers to walk the tree and have more
information about headings, images, links, etc. To link the structure
tree DOM (which is empty) to the text layer aria-owns is used. This
required modifying the text layer creation so that marked items are
now tracked.
Fixes #6269
This will need more work to wire up the annotation layer. We'll also need to come up with a way to better handle alt text for images. Currently, the browser/screen reader sees an empty
figure
so it doesn't announce the alt text.