Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html2htpy includes unneccessary white space #74

Open
pelme opened this issue Jan 18, 2025 · 10 comments
Open

html2htpy includes unneccessary white space #74

pelme opened this issue Jan 18, 2025 · 10 comments

Comments

@pelme
Copy link
Owner

pelme commented Jan 18, 2025

$ echo \<p\>     \t\t\nHi\n\n\t  \</p\> | html2htpy
from htpy import p

p[""" 		Hi	 """]

it would be nicer of the output was just

p["Hi"]

@OleJoik
Copy link
Contributor

OleJoik commented Jan 31, 2025

I remember concidering to strip the strings, but decided against it for some reason. Cant quite recall why, but might have had to do with some cases where I actually preferred whitespace to stay as is. Possibly theres some quirks with this in multi line strings.

I can take a look

@pelme
Copy link
Owner Author

pelme commented Jan 31, 2025

I am not sure what the best exact rules would be, not all white space are safe to remove. I guess there are html minifiers and similar that could be used as inspiration for how to properly strip some of the whitespace

@OleJoik
Copy link
Contributor

OleJoik commented Feb 1, 2025

One special case at least would be <pre> tags. It should preserve most spaces and newlines.
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre

@OleJoik
Copy link
Contributor

OleJoik commented Feb 1, 2025

Another thing to consider here; other html tags also can have goofy behavior as relates to css properties such as white-space and white-space-collapse. After reviewing this I'm not sure ever stripping whitespace is a good idea, as in quite a few cases the white space does matter.

To play around a bit, I made a codepen based on this mdn example that showcases a couple of cases where we would not want (all) whitespace trimmed. Below is a little teaser:

<!-- Example: Newlines preserved -->
<h2 style="white-space-collapse: preserve-breaks;">
  
  In this case
  all  the   newlines   are   preserved
  in    the     heading, 
  but    spaces    are    collapsed     .
  
</h2>

One option might be to filter for these properties and handle them in a special way. I'm not sure if I'd recommend it though, as it might quickly get complicated. w3 spec defines a non trivial set of white space processing rules that we should probably try to abide by if we do decide to do some trimmin'.

Edit to add: These properties might also be inherited or assigned with a class as in the code-pen, so it would be impossible to accurately recognize when they're applied to a given string.

@OleJoik
Copy link
Contributor

OleJoik commented Feb 1, 2025

An option might be to disregard the css props and trim away, though leave pre elements as is. The benefit might outweigh the cost for those few (?) users going crazy with whitespace css attributes

@pelme
Copy link
Owner Author

pelme commented Feb 1, 2025

What about:

  • text.replace(" ", " ").strip() all nodes except <pre>, <code> and <textarea> (maybe there are a few others too)
  • Add --preserve-whitespace to opt-out of this behaviour.

When copy/pasting HTML code from the internet, it is often filled with spaces/white space that makes the HTML document look nice but is just there because it makes the HTML source look good rather then required from a functionality point of view.

@pelme
Copy link
Owner Author

pelme commented Feb 1, 2025

To be clear: I do not think we should bother about white-space-collapse: preserve-breaks. I never used it myself AFAIK and think it is pretty rare. html2py is mostly meant as a starting point when converting a bulk of html to htpy but requires manual inspection/fixups.

@OleJoik
Copy link
Contributor

OleJoik commented Feb 1, 2025

I agree with your comments, and this should be pretty simple to implement I guess. I have some time and can bring a PR tomorrow?

@pelme
Copy link
Owner Author

pelme commented Feb 1, 2025

That is very welcome! :)

@OleJoik
Copy link
Contributor

OleJoik commented Feb 2, 2025

#84

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants