-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOCX reader discards figure caption (regression) #9610
Comments
Probably due to one of these items from the .2 changelog:
|
Here's what I'm seeing:
This is not being parsed as a Figure at all, which is a separate issue perhaps. |
caption.docx has
then in doc.styles:
so the caption name is Note that changing the styleId to "ImageCaption" allows the caption to be parsed as a regular paragraph. So, I think what is going on is this: Pandoc identifies paragraphs with style 'caption' as table captions. They are not emitted as regular paragraphs, but because we do not at this point have special handling for figures with captions, the result is that it gets dropped altogether. Obviously not a great situation, but the fix would involve proper support for captioned images as Figure elements, which we've never had. |
Notes: see also #9391 Word represents captions with a p element either before or after the image or table. The caption paragraph has pPr Pandoc's own docx writer uses ImageCaption and TableCaption classes. We should probably just use Caption to be more like Word. |
Problem description
The latest pandoc version(s) 3.1.12.3 (possibly .2) seems to drop figure captions from docx during import.
Latest known version where it works: 3.1.12.1
Reproduction
I am attaching a docx and two json outputs from: pandoc 3.1.12.1 and 3.1.12.3
here's the diff (only the caption is missing)
caption.docx
test-latest.json
test-3-1-11.json
The text was updated successfully, but these errors were encountered: