Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpanded < > & #109

Closed
stefanor opened this issue Jan 14, 2016 · 15 comments
Closed

unexpanded < > & #109

stefanor opened this issue Jan 14, 2016 · 15 comments
Labels

Comments

@stefanor
Copy link
Contributor

From: https://bugs.debian.org/791470

Version: 2015.6.21-1 (and current master):

$ echo '<body>&lt;&gt;&amp;</body>' | html2markdown
&lt;&gt;&amp;

It worked correctly in 2014.9.25-1:

$ echo '<body>&lt;&gt;&amp;</body>' | html2markdown
<>&
@Alir3z4 Alir3z4 added the bug label Jan 14, 2016
@stefanor
Copy link
Contributor Author

Bisect blames 446a8eb (that is #57)

@jwilk
Copy link
Contributor

jwilk commented Jan 14, 2016

Relevant part of Markdown documentation: https://daringfireball.net/projects/markdown/syntax#autoescape
While leaving the entities unexpanded is technically OK, it makes the output unnecessarily illegible for human readers.

@theSage21
Copy link
Collaborator

@jwilk any reasons I should avoid closing this issue?

@tahajahangir
Copy link

@jwilk The relevant part of of markdown-spec is about converting markdown to html, not about converting html to markdown.

If one is writing markdown only for converting it to html (and not presenting directly to humans), it's ok for him to use entities. But for a html2text library it's unacceptable.

We use this library to convert html part of emails to text part. With newer versions of html2text, the input:
From: "My Name"<span>&lt;[email protected]&gt;</span>
generates output
From: "My Name"&lt;[email protected]&gt;
but it should generate
From: "My Name"<[email protected]>

@theSage21 Please reopen the issue (to discuss and revert #57)

@theSage21 theSage21 reopened this Jun 13, 2016
@theSage21
Copy link
Collaborator

I am going to be on a flaky Internet connection for about a month. I'll try to fix this as soon as possible.

@theSage21
Copy link
Collaborator

I find this convincing. aaronsw/html2text#59
How about a -human flag like the du command? That would make sense.

@tahajahangir
Copy link

tahajahangir commented Jul 13, 2016

I suggest it to be --html-escape flag, although escaping can be done by user himself after converting html to text.

gabalese pushed a commit to rolepoint/html2text that referenced this issue Feb 7, 2017
I don't see the point of keeping HTML entities as-is during the
conversion (i.e. not converting &amp; into &). The reason for doing it
that way originally is not convincing:
Alir3z4#109
@bjones1
Copy link

bjones1 commented Feb 20, 2017

The patch @gabalese made fixes this issue for me -- would you consider applying it?

@theSage21
Copy link
Collaborator

Seems good. @Alir3z4 you agree?

@Alir3z4
Copy link
Owner

Alir3z4 commented Feb 26, 2017

@theSage21 It makes sense to me too.

@bjones1
Copy link

bjones1 commented Mar 3, 2017

Pinging -- would you like me to submit a pull request containing @gabalese's fix? Or would you prefer to apply it yourself?

@Alir3z4
Copy link
Owner

Alir3z4 commented Mar 4, 2017

@bjones1 Please feel free to send the patch, I'll make sure to get reviewed quick and released as soon as possible.

@ciprianmiclaus
Copy link

I will pick this up and send a patch.

ciprianmiclaus pushed a commit to ciprianmiclaus/html2text that referenced this issue Jun 15, 2017
@Alir3z4
Copy link
Owner

Alir3z4 commented Jun 15, 2017 via email

@bjones1
Copy link

bjones1 commented Jun 21, 2017

@ciprianmiclaus, thanks. I got bogged down in other areas. This fix will help me on my projects.

snarfed added a commit to snarfed/granary that referenced this issue Dec 26, 2019
…o `s

... so that GitHub renders HTML entities like &gt; inside them instead of leaving them escaped. background: https://chat.indieweb.org/dev/2019-12-24#t1577174464779200

the html2text people have gone back and forth on how to handle this: Alir3z4/html2text#57, Alir3z4/html2text#109, etc. original bridgy publish github escaping was added in snarfed/bridgy#810.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants