Automatic text wrapping/text box filling #6201

machineonamission · 2022-04-11T18:59:57Z

many people have written scripts to do this and it's relatively easy with the library's getsize func but I feel like it really should be a built-in feature.

i think there should be a draw_text_box or similar function which has these properties on top of the existing draw_multiline_text:

you can specify a max width in pixels and Pillow will automatically insert line breaks to keep the text's width under the max width
- I think there should be an option to only break at word boundaries unless the word exceeds the max width, like the CSS word-break property
you can specify a max height in pixels with one of two behaviors:
- when reaching the max height, the text box is clipped and doesn't draw any more.
- when reaching the max height, the font size is gradually reduced until the text fits inside
  - I think a min font size property would be useful here
- also an option to find the biggest font size that fits within the text box
  - a max font size property would be useful here

I'm not very experienced with the library internals or C in general so for now I won't make a pull, I just want to throw the idea out there to the devs

The text was updated successfully, but these errors were encountered:

machineonamission · 2022-04-11T19:02:29Z

also options to align text left, right, center, or justify would be useful as well. This script implements that

and top/middle/bottom text alignment. another example script

nulano · 2022-04-11T19:40:33Z

Sounds like a reasonable enhancement request. Note that there might be issues with mixed LTR/RTL text which will need extra tests.

A few notes:

with the library's getsize func

Please use the getlength function instead, the API of the getsize function is fundamentaly broken (especially with non-English text) and shouldn't be used for text layout purposes.

I think there should be an option to only break at word boundaries unless the word exceeds the max width, like the CSS word-break property

Word breaking is quite a difficult task, I'd suggest to constrain this to spaces to start with.

when reaching the max height, the font size is gradually reduced until the text fits inside

Not possible within the current API, a font is created at a given size and cannot be easily changed.

also options to align text left, right, center

Already possible using the align parameter, only justify is not yet supported. It would also require extra work in the new proposed function, so I see that as a separate request.

and top/middle/bottom text alignment

Already possible, use the anchor parameter with multiline text.

machineonamission · 2022-04-11T19:56:02Z

Please use the getlength function instead, the API of the getsize function is fundamentaly broken (especially with non-English text) and shouldn't be used for text layout purposes.

oh, good to know. probably should add that to the docs?

Word breaking is quite a difficult task, it might be better to constrain this to spaces to start with.

right yeah I forgot CJK and other languages don't have definite characters at word boundaries, I just meant to break at whitespace which as I understand it shouldn't be too hard and might actually be faster than individual character breaking. i'd add support for zero-width spaces to allow some external library or native speaker to mark word boundaries for pillow assuming its a non-trivial task

Not possible within the current API, a font is created at a given size and cannot be easily changed.

ah so that's why the script I linked loads from file every change. is there an existing way to regenerate fonts without reloading from file or would that need to be an entire API change?

thanks for the quick and detailed response!

nulano · 2022-04-11T20:47:40Z

oh, good to know. probably should add that to the docs?

I think it will be deprecated soon, it's just a matter of working out the replacement (font.getsize_multiline doesn't have a clear replacement, that might be made easier by cleaning up the parameters as suggested in #6195 (comment)). Discussed in #5816.

i'd add support for zero-width spaces

That part was just a suggestion to avoid overcomplicating things. Sure, zero-width spaces can probably be supported.

is there an existing way to regenerate fonts without reloading from file or would that need to be an entire API change?

I think you can load a font file in Python and then pass the bytes as input to ImageFont.truetype.

machineonamission · 2022-04-11T20:49:34Z

I see. Thank you!

atomicparade · 2022-05-07T17:19:26Z

I've made a first attempt at implementing this, using a greedy algorithm:

https://github.com/atomicparade/pil_autowrap/blob/main/pil_autowrap/pil_autowrap.py#L73-L220

Example output here.

Issues:

I don't know how appropriate the results are for Arabic and Hebrew. Chinese, Japanese, and Korean text is not broken up properly.
I made the assumption that the line height is equal to the font size; however, looking at some of the generated images for Arabic and Hebrew, this doesn't appear to be the case. Maybe FreeTypeFont.getbbox would be more appropriate than FreeTypeFont.getlength?

Current blind spots and possible improvements:

Test wrapping with non-breaking spaces.
Allow wrapping after hyphens (but not after non-breaking hyphens).
When there are multiple spaces between two words, preserve the number of spaces (as long as the text isn't wrapped between those two words).
Add support for top-to-bottom text.

nulano · 2022-05-07T22:03:45Z

I don't know how appropriate the results are for Arabic and Hebrew. Chinese, Japanese, and Korean text is not broken up properly.

If you are referring to this:

Certain characters in those languages should not come at the end of a line, certain characters should not come at the start of a line, and some characters should never be split up across two lines. For example, periods and closing parentheses are not allowed to start a line

then I would not worry about it. Similar rules exist in some European languages and even MS Word doesn't really help there.

I made the assumption that the line height is equal to the font size; however, looking at some of the generated images for Arabic and Hebrew, this doesn't appear to be the case. Maybe FreeTypeFont.getbbox would be more appropriate than FreeTypeFont.getlength?

The text height is calculated here:

Pillow/src/PIL/ImageDraw.py

Lines 514 to 516 in 1340237

    
           line_spacing = ( 
        
               self.textsize("A", font=font, stroke_width=stroke_width)[1] + spacing 
        
           )

where spacing is a parameter defaulting to 4. This is not really accurate for some fonts, but it is used for historical reasons.

Do not use getbbox. That returns the height of the rendered text (which could be different for each line) and width of the rendered text (again, can be different with e.g. slanted text). It is not appropriate for text layout. Fonts generally don't exceed the line height and layout width they report, or only do so by a small amount when appropriate for stylistic reasons. (The height calculated above is not the actual line height reported by the font, but should be close enough in most cases).

machineonamission · 2022-05-07T22:06:08Z

I feel that getting it working with “easier” languages first (ones that use white space or other characters to break words) would be the best thing to do right now as CJK word-breaking seems like a non-trivial task that could be hacked in by adding zero-width spaces. Is there an existing library that can determine word boundaries that could be included by PIL as an extra?

nulano · 2022-05-08T00:16:55Z

I may have misunderstood the Wikipedia article. The Unicode Line Breaking Algorithm is more helpful.

I think that it is probably sufficient to implement the non-tailorable part of the algorithm (see start of Table 1), which is just that line break characters are a mandatory break and spaces/zero-width spaces are an optional break. According to LineData.txt, this means it is sufficient to consider replacing the SPACE (U+20) and ZERO-WIDTH SPACE (U+200B) characters with "\n". The rest of the Unicode Line Breaking Algorithm would probably be best left to another library (e.g. by inserting zero-width spaces).

atomicparade · 2022-05-09T02:10:47Z

Is there an existing library that can determine word boundaries that could be included by PIL as an extra?

After a brief search, I couldn't find one that is freely available.

The Unicode Line Breaking Algorithm is more helpful.

I'm going to give this a shot! I'll start with Table 1 and leave all of the other character classes as break-allowed for now, though I think I'd like to try to implement the others as well.

machineonamission · 2022-05-09T02:37:43Z

If you're going to implement the entire Unicode Line Breaking Algorithm, I recommend making it its own library
If it's really complex or requires a table of characters or something, it could be specified as a PIL extra to not bloat PIL

nulano · 2022-05-09T04:50:59Z

If you want to implement the full algorithm, it might make sense to add it to Raqm (which Pillow uses internally), or make it a separate library that Raqm can use. See HOST-Oman/libraqm#50

requires a table of characters or something

The LineData.txt from Unicode I linked above is the official list Unicode character line-breaking classes.

machineonamission · 2022-05-09T04:52:06Z

I wasn’t familiar enough with PIL’s internals to suggest that but that is a good idea

nulano · 2022-05-09T05:01:19Z

Is there an existing library that can determine word boundaries that could be included by PIL as an extra?

After a brief search, I couldn't find one that is freely available.

The Raqm issue mentions https://github.com/adah1972/libunibreak. I haven't looked at it too closely, but it seems to be a C library implementing the Unicode algorithm that returns a list of valid break positions.

atomicparade · 2022-05-11T02:41:40Z

I am exploring adding a font_wraptext function (to start; would probably be nice to have a function to automatically determine an appropriate font [size] as well) to src/_imagingft.c and adding unibreak as a feature that depends on libunibreak being installed.

~~It does look like libunibreak maintains internal state (linebreak.c -> set_linebreaks_utf8), so I am not sure whether this will work well with multithreading.~~ (Never mind! It doesn’t.)

Edit: Somehow I completely missed the part about adding this as a feature to libraqm itself. Hmm...

DinoSourcesRex · 2023-09-20T08:55:02Z

Any news on this? Would love this feature.

aclark4life · 2023-09-20T15:35:35Z

Looks like a stalled attempt at greatness, maybe someone can pick up the effort: https://github.com/atomicparade/pil_autowrap

Pandede · 2025-01-23T08:17:45Z

Any progress on this topic? It would be an important feature.

radarhere added the Enhancement label Apr 11, 2022

machineonamission mentioned this issue Apr 12, 2022

Rework/optimize the captioning system machineonamission/mediaforge#90

Closed

nulano mentioned this issue Jun 5, 2022

ImageDraw.text text falling outside of image #6351

Closed

radarhere changed the title ~~Feature Request: automatic text wrapping/text box filling~~ Automatic text wrapping/text box filling Aug 5, 2022

radarhere mentioned this issue Mar 3, 2024

Multiline text is not centered correctly #2067

Closed

This comment was marked as off-topic.

Sign in to view

radarhere mentioned this issue Jan 29, 2025

Added "justify" align for multiline text #8721

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic text wrapping/text box filling #6201

Automatic text wrapping/text box filling #6201

machineonamission commented Apr 11, 2022

machineonamission commented Apr 11, 2022 •

edited

Loading

nulano commented Apr 11, 2022 •

edited

Loading

machineonamission commented Apr 11, 2022

nulano commented Apr 11, 2022 •

edited

Loading

machineonamission commented Apr 11, 2022 via email

atomicparade commented May 7, 2022 •

edited

Loading

nulano commented May 7, 2022 •

edited

Loading

machineonamission commented May 7, 2022

nulano commented May 8, 2022 •

edited

Loading

atomicparade commented May 9, 2022

machineonamission commented May 9, 2022

nulano commented May 9, 2022 •

edited

Loading

machineonamission commented May 9, 2022 •

edited

Loading

nulano commented May 9, 2022 •

edited

Loading

atomicparade commented May 11, 2022 •

edited

Loading

DinoSourcesRex commented Sep 20, 2023

aclark4life commented Sep 20, 2023

This comment was marked as off-topic.

Pandede commented Jan 23, 2025

Automatic text wrapping/text box filling #6201

Automatic text wrapping/text box filling #6201

Comments

machineonamission commented Apr 11, 2022

machineonamission commented Apr 11, 2022 • edited Loading

nulano commented Apr 11, 2022 • edited Loading

machineonamission commented Apr 11, 2022

nulano commented Apr 11, 2022 • edited Loading

machineonamission commented Apr 11, 2022 via email

atomicparade commented May 7, 2022 • edited Loading

nulano commented May 7, 2022 • edited Loading

machineonamission commented May 7, 2022

nulano commented May 8, 2022 • edited Loading

atomicparade commented May 9, 2022

machineonamission commented May 9, 2022

nulano commented May 9, 2022 • edited Loading

machineonamission commented May 9, 2022 • edited Loading

nulano commented May 9, 2022 • edited Loading

atomicparade commented May 11, 2022 • edited Loading

DinoSourcesRex commented Sep 20, 2023

aclark4life commented Sep 20, 2023

This comment was marked as off-topic.

Pandede commented Jan 23, 2025

machineonamission commented Apr 11, 2022 •

edited

Loading

nulano commented Apr 11, 2022 •

edited

Loading

nulano commented Apr 11, 2022 •

edited

Loading

atomicparade commented May 7, 2022 •

edited

Loading

nulano commented May 7, 2022 •

edited

Loading

nulano commented May 8, 2022 •

edited

Loading

nulano commented May 9, 2022 •

edited

Loading

machineonamission commented May 9, 2022 •

edited

Loading

nulano commented May 9, 2022 •

edited

Loading

atomicparade commented May 11, 2022 •

edited

Loading