Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic text wrapping/text box filling #6201

Open
machineonamission opened this issue Apr 11, 2022 · 19 comments
Open

Automatic text wrapping/text box filling #6201

machineonamission opened this issue Apr 11, 2022 · 19 comments

Comments

@machineonamission
Copy link

many people have written scripts to do this and it's relatively easy with the library's getsize func but I feel like it really should be a built-in feature.

i think there should be a draw_text_box or similar function which has these properties on top of the existing draw_multiline_text:

  • you can specify a max width in pixels and Pillow will automatically insert line breaks to keep the text's width under the max width
    • I think there should be an option to only break at word boundaries unless the word exceeds the max width, like the CSS word-break property
  • you can specify a max height in pixels with one of two behaviors:
    • when reaching the max height, the text box is clipped and doesn't draw any more.
    • when reaching the max height, the font size is gradually reduced until the text fits inside
      • I think a min font size property would be useful here
    • also an option to find the biggest font size that fits within the text box
      • a max font size property would be useful here

I'm not very experienced with the library internals or C in general so for now I won't make a pull, I just want to throw the idea out there to the devs

@machineonamission
Copy link
Author

machineonamission commented Apr 11, 2022

also options to align text left, right, center, or justify would be useful as well. This script implements that

and top/middle/bottom text alignment. another example script

@nulano
Copy link
Contributor

nulano commented Apr 11, 2022

Sounds like a reasonable enhancement request. Note that there might be issues with mixed LTR/RTL text which will need extra tests.

A few notes:

with the library's getsize func

Please use the getlength function instead, the API of the getsize function is fundamentaly broken (especially with non-English text) and shouldn't be used for text layout purposes.

I think there should be an option to only break at word boundaries unless the word exceeds the max width, like the CSS word-break property

Word breaking is quite a difficult task, I'd suggest to constrain this to spaces to start with.

when reaching the max height, the font size is gradually reduced until the text fits inside

Not possible within the current API, a font is created at a given size and cannot be easily changed.

also options to align text left, right, center

Already possible using the align parameter, only justify is not yet supported. It would also require extra work in the new proposed function, so I see that as a separate request.

and top/middle/bottom text alignment

Already possible, use the anchor parameter with multiline text.

@machineonamission
Copy link
Author

Please use the getlength function instead, the API of the getsize function is fundamentaly broken (especially with non-English text) and shouldn't be used for text layout purposes.

oh, good to know. probably should add that to the docs?

Word breaking is quite a difficult task, it might be better to constrain this to spaces to start with.

right yeah I forgot CJK and other languages don't have definite characters at word boundaries, I just meant to break at whitespace which as I understand it shouldn't be too hard and might actually be faster than individual character breaking. i'd add support for zero-width spaces to allow some external library or native speaker to mark word boundaries for pillow assuming its a non-trivial task

Not possible within the current API, a font is created at a given size and cannot be easily changed.

ah so that's why the script I linked loads from file every change. is there an existing way to regenerate fonts without reloading from file or would that need to be an entire API change?

thanks for the quick and detailed response!

@nulano
Copy link
Contributor

nulano commented Apr 11, 2022

oh, good to know. probably should add that to the docs?

I think it will be deprecated soon, it's just a matter of working out the replacement (font.getsize_multiline doesn't have a clear replacement, that might be made easier by cleaning up the parameters as suggested in #6195 (comment)). Discussed in #5816.

i'd add support for zero-width spaces

That part was just a suggestion to avoid overcomplicating things. Sure, zero-width spaces can probably be supported.

is there an existing way to regenerate fonts without reloading from file or would that need to be an entire API change?

I think you can load a font file in Python and then pass the bytes as input to ImageFont.truetype.

@machineonamission
Copy link
Author

machineonamission commented Apr 11, 2022 via email

@atomicparade
Copy link

atomicparade commented May 7, 2022

I've made a first attempt at implementing this, using a greedy algorithm:

https://github.com/atomicparade/pil_autowrap/blob/main/pil_autowrap/pil_autowrap.py#L73-L220

Example output here.

Issues:

  • I don't know how appropriate the results are for Arabic and Hebrew. Chinese, Japanese, and Korean text is not broken up properly.
  • I made the assumption that the line height is equal to the font size; however, looking at some of the generated images for Arabic and Hebrew, this doesn't appear to be the case. Maybe FreeTypeFont.getbbox would be more appropriate than FreeTypeFont.getlength?

Current blind spots and possible improvements:

  • Test wrapping with non-breaking spaces.
  • Allow wrapping after hyphens (but not after non-breaking hyphens).
  • When there are multiple spaces between two words, preserve the number of spaces (as long as the text isn't wrapped between those two words).
  • Add support for top-to-bottom text.

@nulano
Copy link
Contributor

nulano commented May 7, 2022

I don't know how appropriate the results are for Arabic and Hebrew. Chinese, Japanese, and Korean text is not broken up properly.

If you are referring to this:

Certain characters in those languages should not come at the end of a line, certain characters should not come at the start of a line, and some characters should never be split up across two lines. For example, periods and closing parentheses are not allowed to start a line

then I would not worry about it. Similar rules exist in some European languages and even MS Word doesn't really help there.


I made the assumption that the line height is equal to the font size; however, looking at some of the generated images for Arabic and Hebrew, this doesn't appear to be the case. Maybe FreeTypeFont.getbbox would be more appropriate than FreeTypeFont.getlength?

The text height is calculated here:

Pillow/src/PIL/ImageDraw.py

Lines 514 to 516 in 1340237

line_spacing = (
self.textsize("A", font=font, stroke_width=stroke_width)[1] + spacing
)

where spacing is a parameter defaulting to 4. This is not really accurate for some fonts, but it is used for historical reasons.

Do not use getbbox. That returns the height of the rendered text (which could be different for each line) and width of the rendered text (again, can be different with e.g. slanted text). It is not appropriate for text layout. Fonts generally don't exceed the line height and layout width they report, or only do so by a small amount when appropriate for stylistic reasons. (The height calculated above is not the actual line height reported by the font, but should be close enough in most cases).

@machineonamission
Copy link
Author

I feel that getting it working with “easier” languages first (ones that use white space or other characters to break words) would be the best thing to do right now as CJK word-breaking seems like a non-trivial task that could be hacked in by adding zero-width spaces. Is there an existing library that can determine word boundaries that could be included by PIL as an extra?

@nulano
Copy link
Contributor

nulano commented May 8, 2022

I may have misunderstood the Wikipedia article. The Unicode Line Breaking Algorithm is more helpful.

I think that it is probably sufficient to implement the non-tailorable part of the algorithm (see start of Table 1), which is just that line break characters are a mandatory break and spaces/zero-width spaces are an optional break. According to LineData.txt, this means it is sufficient to consider replacing the SPACE (U+20) and ZERO-WIDTH SPACE (U+200B) characters with "\n". The rest of the Unicode Line Breaking Algorithm would probably be best left to another library (e.g. by inserting zero-width spaces).

@atomicparade
Copy link

Is there an existing library that can determine word boundaries that could be included by PIL as an extra?

After a brief search, I couldn't find one that is freely available.

The Unicode Line Breaking Algorithm is more helpful.

I'm going to give this a shot! I'll start with Table 1 and leave all of the other character classes as break-allowed for now, though I think I'd like to try to implement the others as well.

@machineonamission
Copy link
Author

If you're going to implement the entire Unicode Line Breaking Algorithm, I recommend making it its own library
If it's really complex or requires a table of characters or something, it could be specified as a PIL extra to not bloat PIL

@nulano
Copy link
Contributor

nulano commented May 9, 2022

If you want to implement the full algorithm, it might make sense to add it to Raqm (which Pillow uses internally), or make it a separate library that Raqm can use. See HOST-Oman/libraqm#50

requires a table of characters or something

The LineData.txt from Unicode I linked above is the official list Unicode character line-breaking classes.

@machineonamission
Copy link
Author

machineonamission commented May 9, 2022

I wasn’t familiar enough with PIL’s internals to suggest that but that is a good idea

@nulano
Copy link
Contributor

nulano commented May 9, 2022

Is there an existing library that can determine word boundaries that could be included by PIL as an extra?

After a brief search, I couldn't find one that is freely available.

The Raqm issue mentions https://github.com/adah1972/libunibreak. I haven't looked at it too closely, but it seems to be a C library implementing the Unicode algorithm that returns a list of valid break positions.

@atomicparade
Copy link

atomicparade commented May 11, 2022

I am exploring adding a font_wraptext function (to start; would probably be nice to have a function to automatically determine an appropriate font [size] as well) to src/_imagingft.c and adding unibreak as a feature that depends on libunibreak being installed.

It does look like libunibreak maintains internal state (linebreak.c -> set_linebreaks_utf8), so I am not sure whether this will work well with multithreading. (Never mind! It doesn’t.)

Edit: Somehow I completely missed the part about adding this as a feature to libraqm itself. Hmm...

@radarhere radarhere changed the title Feature Request: automatic text wrapping/text box filling Automatic text wrapping/text box filling Aug 5, 2022
@DinoSourcesRex
Copy link

Any news on this? Would love this feature.

@aclark4life
Copy link
Member

Looks like a stalled attempt at greatness, maybe someone can pick up the effort: https://github.com/atomicparade/pil_autowrap

@quantumpotato

This comment was marked as off-topic.

@Pandede
Copy link

Pandede commented Jan 23, 2025

Any progress on this topic? It would be an important feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants