Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link tokens [] are escaped #311

Closed
mdeweerd opened this issue Mar 1, 2022 · 8 comments
Closed

Link tokens [] are escaped #311

mdeweerd opened this issue Mar 1, 2022 · 8 comments
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@mdeweerd
Copy link

mdeweerd commented Mar 1, 2022

Describe the bug

- [KiCAD](KiCAD)
- [KiCAD - Scripts](KiCAD - Scripts)
- [KiCAD - CNC](KiCAD - CNC)

is converted to

- [KiCAD](KiCAD)
- \[KiCAD - Scripts\](KiCAD - Scripts)
- \[KiCAD - CNC\](KiCAD - CNC)

Rendering the links useless.

There are links that are properly resolved in gitlab.

expectation
The link is maintained.

problem
This is a problem because this breaks the expected behaviour after applying mdformat.

Reproduce the bug

Apply mdformat on

- [KiCAD](KiCAD)
- [KiCAD - Scripts](KiCAD - Scripts)
- [KiCAD - CNC](KiCAD - CNC)

List your environment

I suggest you update the hint to get the version (jupyter-hook does not exist).

$ mdformat --version
mdformat 0.7.13 (mdformat_frontmatter: 0.4.1, mdformat_gfm: 0.3.5,
mdformat_tables: 0.4.1)
$ python --version
Python 3.9.10

pre-commit configuration:

  - repo: https://github.com/executablebooks/mdformat
    # Do this before other tools "fixing" the line endings
    rev: 0.7.13
    hooks:
      - id: mdformat
        name: Format Markdown
        entry: mdformat  # Executable to run, with fixed options
        language: python
        types: [markdown]
        args: [--wrap, '75', --number]
        additional_dependencies:
          - mdformat-toc
@mdeweerd mdeweerd added the bug Something isn't working label Mar 1, 2022
@hukkin
Copy link
Owner

hukkin commented Mar 1, 2022

Hey there!

The latter two links do not have a valid link destination, so a CommonMark compliant parser will not parse them as links.

Here's the spec for link destination https://spec.commonmark.org/0.30/#link-destination
You will need to either remove the spaces, or surround the link destination with angle brackets to make it valid.

@hukkin hukkin added the invalid This doesn't seem right label Mar 1, 2022
@mdeweerd
Copy link
Author

mdeweerd commented Mar 1, 2022

I hear that "it's not valid CommonMark compliant', but mdformat is changing the meaning of the tokens by adding backslashes that are not needed.

Now, I appreciate that mdformat is trying to make non compliant code more compliant, but I'ld prefer that it does this in a way that is functionnally compatible with what I can already observe.

Strangely enough, gitlab still accepts \[my link\](my link) as if it was written [my link](my link). Still as a reader I expect that this would no longer be a link.

On gitlab

  • (desc)[KiCAD Scripts] links to KiCAD%20Scripts
  • (desc)[KiCAD-Scripts] links to KiCAD-Scripts (and that is equivalent to 'KiCAD%20Scripts')

Why spaces in the link?
Having spaces in the link is an easy way to add links because it's just a copy/paste operation, and if a tool like | mdformat would improve this later on by making appropriate changes, then that's fine by me.

That would mean spaces are replaced with dashes (-) or '%20' sequences. As web browsers convert spaces to '%20' and dashes are still dashes in the end, it's preferable to convert spaces to %20 (Example with-space .

It can be argued that mdformat should not add backslashes to the [ and ] characters as under valid CommonMark that probably does not change anything or makes invalid code 'valid' at worst changing the meaning of it in some environments (like gitlab).

It can also that adding a backslash before '[' as for '*' has some usability for cases where this is the user's intention. An algorithm could use the heuristic that the opening and closing parentheses and bracket ([]()) need to be either on the same line or enclose a sequence of characters of reasonable length.

Under the CommonMark reference which is referenced to justify to not change mdformat, it can equally be said that there is no use in backslashing the [] tokens. Having unpaired [] tokens in the link text is as invalid as having spaces in the link destination under the CommonMark specification. So IMHO either both should be "fixed" or both should be ignored for consistency

Another possibility is to let the user choose through some option(s) to choose either one or the other behavior.

I tested the following markdown in both environments, including after applying mdformat. I show the result before applying mdformat, after applying it the rendering seems to be the same.

- [[abc]](link)
- [my link](my link)
- [(abc)](link)
- [(abc)](link "Link Title")
- [link with link title](/uri "My Link Title")
- [[abc](https://github.githubassets.com/favicons/favicon-dark.png)](NotALinkBecauseNotAnImage "Link Title")
- [![abc](https://github.githubassets.com/favicons/favicon-dark.png)](link "Link Title()")
- [![abc](https://github.githubassets.com/favicons/favicon-dark.png)](link "Link Title\(\)")
- [![abc](https://github.githubassets.com/favicons/favicon-dark.png)](link (Link Title))
- [![abc](https://github.githubassets.com/favicons/favicon-dark.png)](link (Invalid Link (Title))

There is a difference only for link targets with spaces:

On gitlab:

gl

On github:

So I still plead to cope with this case differently.

@hukkin
Copy link
Owner

hukkin commented Mar 1, 2022

I hear that "it's not valid CommonMark compliant', but mdformat is changing the meaning of the tokens by adding backslashes that are not needed.

It's not changing any meaning. The invalid links are still invalid links, and the escape character is not significant in any way in CommonMark AST.

Strangely enough, gitlab still accepts [my link](my link) as if it was written [my link](my link). Still as a reader I expect that this would no longer be a link.

I agree that this is very weird. It's an issue with GitLab's parser/renderer however, not mdformat.

That would mean spaces are replaced with dashes (-) or '%20' sequences. As web browsers convert spaces to '%20' and dashes are still dashes in the end, it's preferable to convert spaces to %20

The change you suggest is altering Markdown AST (converting text to link). Mdformat will never do such changes.

It can be argued that mdformat should not add backslashes to the [ and ] characters as under valid CommonMark that probably does not change anything or makes invalid code 'valid' at worst changing the meaning of it in some environments (like gitlab).

This is a separate issue #112

@mdeweerd
Copy link
Author

mdeweerd commented Mar 1, 2022

This is a separate issue #112

Yes and no. That issue seems to be mainly arguing that there is overuse of backslashes. I am arguing that this may change meaning according to the environement, and (high) that what is done for other characters could be done for spaces as well.

I was mentioning Gitlab's behavior to highlight what they do and that's it's rather intuitive. Innovation is often "wrong" until the specification or norm changes.

the escape character is not significant in any way in CommonMark AST.

Then the interpretation of \[my link\](my link) by github is normal given how it interprets [my link](my link) if the escape character is not significant.

Anyway, I am closing this as it is clear I am not going to change opinions here.

@mdeweerd mdeweerd closed this as completed Mar 1, 2022
@hukkin
Copy link
Owner

hukkin commented Mar 1, 2022

I am arguing that this may change meaning according to the environement

I see. Mdformat doesn't target any environments, however. It is CommonMark compliant, and if the environment is not, then that is the environment's issue.

@mdeweerd
Copy link
Author

mdeweerd commented Mar 1, 2022

I may be stretching the Robustness principle a bit, but I'ld say that not adding '' to [ still maintains the "compliancy" of input documents when they are and supposing that spaces in link targets are part of the link is inline with "accept non-conformant input as long as the meaning is clear" in documents that weren't officially compliant in the first place.

@mdeweerd
Copy link
Author

mdeweerd commented Mar 2, 2022

While checking if tickboxes are part of the CommonMark specification, I stumbled on this discussion on the CommonMark forum: https://talk.commonmark.org/t/support-for-gfm-like-checkboxes/2007/9 .

So it seems that CommonMark is frozen to cover features from John Gruber's Markdown only which leaves little room for adding features to mdformat's "core" to keep up with what is happening in the field - unless the extensions or plugins scope with variations on common mark.

So let's say this particularity of the link feature should be part of some plugin like mdformat-gitlab .

@hukkin
Copy link
Owner

hukkin commented Mar 2, 2022

👍 It is definitely possible to achieve what you want with a plugin. There's info on how to make a plugin in the docs. Also the plugin template may prove useful https://github.com/executablebooks/mdformat-plugin/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants