-
-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.x] Batching - Add MultiRect command #68960
Conversation
00c3d58
to
e70fad4
Compare
scene/resources/font.cpp
Outdated
@@ -64,6 +76,8 @@ void Font::draw(RID p_canvas_item, const Point2 &p_pos, const String &p_text, co | |||
|
|||
int chars_drawn = 0; | |||
bool with_outline = has_outline(); | |||
|
|||
set_multirect_enabled(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most complex controls use draw_char
directly, so the same probably should be done at least in the Label
, LineEdit
, TextEdit
and RichTextLabel
draw code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Driver code looks fine to me.
I am unsure about adding the multirect helper class. It seems like it may be pretty prone to issues, especially in a multithreaded context. What would happen if two threads tried to add multirects simultaneously? To me it looks like they would have to flush the multirect buffer each time they added a rect (but since there is no mutex here I'm not sure that would reliably happen) which of course would defeat the purpose of the multirect.
The other piece I am worried about is that it appears that the flush doesn't happen automatically. For example, I don't see a flush at all in dynamic_font.cpp and I worry that could lead to an edge case where the multirect is never submitted.
Looking deeper I think I understand why its implemented like this. It seems you are avoiding allocating an array in user-space for each of the areas that rely on the multi-rect command. So having the shared helper makes sense. Additionally, canvas_item_add_texture_multirect_region()
takes the whole multirect at once so you need to buffer the rects somewhere before calling canvas_item_add_texture_multirect_region()
.
It may be more efficient to just make use of the multirect class in user-space. For example, in tilemap.cpp it would be fairly trivial to buffer the rects and then just make one call to canvas_item_add_texture_multirect_region()
.
Alternatively, perhaps VisualServerCanvasHelper could be implemented to return an allocated-on-demand multirect so you can expose the same easy to use API. Then at the end of a frame, you could flush all the multirects and put them back in the pool for reuse. This would take more memory than the current static approach, but it should be able to use the same easy-to-use API, add warnings for unflushed uses, and avoid threading issues.
I benchmarked this: OS: Fedora 36 (KDE + KWin with compositing disabled) GLES2 is used (project default). A non-editor release build with LTO enabled is used for both "Before" and "After (this PR)". I've also tested GLES3 and it showed similar performance characteristics.
Footnotes
|
Thanks for comments. Agree on the threading, I'll add the suggestions to the PR when I'm back at home next week. 👍 |
7103ccd
to
0f4c030
Compare
Stack friendly fixed size MultiRectAfter trying an approach using a thread safe pool of The current size on the stack would be approx: This should be as most modern stacks are 1Mb, (or 512Kb to be safe), and the So this new version does most of the filling of The exception is tilemaps: TilemapsI've slightly altered the mechanism for tilemaps to make it a little more optimal. Instead of using the same general MultiRect cache, it uses a single set of caches (and a single mutex so that only one tilemap quadrant can be filled at a time). However the difference is that the tilemap MultiRect caches can be filled out of order. I.e. you can fill cache 0, start cache 1, then return to filling 0 if a similar set of rects is detected in the quadrant (and there is no overlap via an overlap test). The overlap test is a slight expense, but this will tend to be done as a one off rather than every frame as in the batching, so is a lot more efficient. And if a quadrant contains a lot of changes between swaps and transposes etc, then these will now be efficiently transcribed into a small number of The Other Canvas Node typesAlthough @bruvzg suggested there may be a few more canvas types using This includes:
|
68459b5
to
1c23988
Compare
31d4995
to
0cbfffd
Compare
e0ed5ab
to
f70f4d6
Compare
Large groups of similar rects can be processed more efficiently using the MultiRect command. Processing common to the group can be done as a one off, instead of per rect. Adds the new API to VisualServerCanvas, and uses the new functionality from Font, BitmapFont, DynamicFont and TileMap, via the VisualServerCanvasHelper class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I like this new approach to the canvas helper a lot more.
Tested locally and can confirm that there is a pretty nice speedup with a lot of text in a scene (I tested with a few thousand labels)
Thanks! |
Large groups of similar rects can be processed more efficiently using the
MultiRect
command. Processing common to the group can be done as a one off, instead of per rect.Adds the new API to VisualServerCanvas, and uses the new functionality from Font, BitmapFont, DynamicFont and TileMap, via the VisualServerCanvasHelper class.
Can be switched on and off with project setting
rendering/batching/options/use_multirect
(defaults to on), just in case of regressions.Some measurements from a test benchmark
16 fps - without batching
190 fps - with batching
425 fps - with batching +
MultiRect
Test benchmark included below
(reason these are faster seems to be the smaller font doesn't support all the chinese characters used in the benchmark, to save on the download size)
43 fps - without batching
450 fps - with batching
1000 fps - with batching +
MultiRect
Notes
Font::draw()
command (passing a string, rather than by char)MultiRect
s within a quadrant.Test benchmark
batch_test_text.zip
Press left button 10 times. Test with batching off / on, and multirect off on.