-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn on duplicate or conflicting YAML metadata key-values within a metadata block #10312
Comments
I would certainly like to be able to issue a warning in such cases, but we're using the Hackage |
Interestingly, |
We could maybe use |
Using Data.Yaml.Internal.decodeHelper_ Data.Libyaml.decode |
I am not entirely sure the underlying warning there is right: There's no docs here explaining what this warning is supposed to do, and the codebase is unfamiliar, but if I'm reading this right, this would detect conflicting keys (by finding a key-value pair in the original Set not in the final Set?) but it would not detect redundant keys (because then the redundancy would have disappeared when the original Set was created)? |
This works: import Data.ByteString (ByteString)
import Data.Yaml.Internal (decodeHelper_, Warning, ParseException)
import Data.Aeson (FromJSON)
import qualified Text.Libyaml as Y
decodeWarn :: FromJSON a
=> ByteString -> IO (Either ParseException ([Warning], a))
decodeWarn = decodeHelper_ . Y.decode IT does give you duplicate key warnings, e.g.
|
Oh, I see, you're more worried about the case where you have multiple YAML metadata sections in the same document? I read too hastily. Yes, that part is handled by pandoc. |
Ah, so (As long as the different YAML blocks are handled separately (otherwise we risk spurious warnings from reasonable metadata combination uses as I outlined), but Pandoc presumably does the obvious thing of reading them separately & then merging them later on for use and that would avoid any issues here.) |
That's right. OK, I will plan to implement this warning when I get a chance. |
Wait, never mind, this won't work because it's in IO. We have to be able to do YAML parsing outside IO. It may be possible to rewrite the function so it doesn't need IO, but it would take some understanding of conduit and the libyaml and yaml libraries. |
It turns out that |
Pandoc currently silently allows key-value metadata fields which are redundant or conflicting within a YAML metadata block. This allows errors or unintended metadata risk to build up over time.
I suggest that Pandoc ought to warn about there being more than 1 instance of a key within a YAML metadata block (but not across multiple YAML metadata blocks).
The current Pandoc metadata block documentation doesn't say much about how fields are required to be unique or non-overlapping, saying just
Currently, Pandoc appears to read values in like a
Data.Map.fromList
operation: the last pair wins, overriding all previous key-values.Example of a conflicting pair of key-values:
yields
value2
, and no warnings:(And if it's just a redundant key, like
key: value2
twice, then it yieldsvalue2
, also no warnings.)I was trying out a new site feature involving the metadata, and discovered I had ~10 instances, built up over the past decades, where a Pandoc YAML metadata field was either: 1. required but missing; 2. duplicated twice exactly; or 3. had two keys with different conflicting values (only one of which was right). Pandoc obviously hadn't warned or hinted about them at all, or else I would've fixed them long ago.
Now, enforcing required metadata is out of scope for Pandoc and should be handled by another tool like the build script, but the other 2 should be handled by Pandoc.
They have to be handled by Pandoc because it is difficult to handle them as a user. They do not appear in the Pandoc API: the
unMeta
map has already erased the duplicates. So there is no way to add a check or lint easily at the Pandoc user level. It needs to be done while reading the original YAML.One possibility would be to change the
fromList
to append, but this would probably be confusing and no one would use it. So it would be more useful to just warn about the cases of redundant keys.When might we want redundant or conflicting keys in a metadata block?
I have a hard time thinking of any legitimate usecase for having the exact same key-value pair twice in the same metadata block. It's easy to imagine that it would be useful for having redundant pairs across multiple metadata blocks, for templating or defaults, that sort of thing, but not within the same metadata block. So I think warning on a redundant pair is a very safe warning to add for what is almost certainly an oversight or error.
Conflicting keys are a little trickier. Again, you obviously might need it across blocks, but within blocks? It's possible that there are systems which try to munge metadata blocks and inject user metadata settings after a default, or vice-versa. But it seems like anything you would do like that where the redundancy/override is the goal, you could do more safely and more cleanly by using multiple metadata blocks?
For example, instead of having multiple
title
orkey
values in a single metadata block, just write multiple ones in a clean, compact, easier to generate & read way, like:which yields the expected AST:
So a warning there is useful to encourage systems to move to a safer way of handling metadata in general, by using metadata defined in other blocks or in other files (like the
--metadata-file
option) .The text was updated successfully, but these errors were encountered: