Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Batching - Add MultiRect command #68960

Merged
merged 1 commit into from
Apr 17, 2023
Merged

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Nov 21, 2022

Large groups of similar rects can be processed more efficiently using the MultiRect command. Processing common to the group can be done as a one off, instead of per rect.

Adds the new API to VisualServerCanvas, and uses the new functionality from Font, BitmapFont, DynamicFont and TileMap, via the VisualServerCanvasHelper class.

Can be switched on and off with project setting rendering/batching/options/use_multirect (defaults to on), just in case of regressions.

Some measurements from a test benchmark

16 fps - without batching
190 fps - with batching
425 fps - with batching + MultiRect

Test benchmark included below
(reason these are faster seems to be the smaller font doesn't support all the chinese characters used in the benchmark, to save on the download size)
43 fps - without batching
450 fps - with batching
1000 fps - with batching + MultiRect

Notes

  • Increases performance in benchmarks of 2.4x over raw batching (large amounts of text, small fill rate)
  • Real world performance increases will probably be a lot more modest, but seems worth doing as the PR is relatively simple, and opens up further avenues for optimization
  • Works through the fonts so will apply to most text using the Font::draw() command (passing a string, rather than by char)
  • Also works for tilemap quadrants, there is special helper for caching and filling multiple MultiRects within a quadrant.
  • As discussed with @clayjohn , could potentially be very interesting to add to 4.x, as 4.x has no batching especially with vulkan. Although the specifics would need to be different - the font access may be different, and it would need a vulkan / GL backend for rendering the new command.

Test benchmark

batch_test_text.zip
Press left button 10 times. Test with batching off / on, and multirect off on.

@lawnjelly lawnjelly force-pushed the multirect branch 2 times, most recently from 00c3d58 to e70fad4 Compare November 21, 2022 16:08
@lawnjelly lawnjelly marked this pull request as ready for review November 21, 2022 16:08
@lawnjelly lawnjelly requested review from a team as code owners November 21, 2022 16:08
@lawnjelly lawnjelly added this to the 3.6 milestone Nov 21, 2022
@@ -64,6 +76,8 @@ void Font::draw(RID p_canvas_item, const Point2 &p_pos, const String &p_text, co

int chars_drawn = 0;
bool with_outline = has_outline();

set_multirect_enabled(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most complex controls use draw_char directly, so the same probably should be done at least in the Label, LineEdit, TextEdit and RichTextLabel draw code.

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver code looks fine to me.

I am unsure about adding the multirect helper class. It seems like it may be pretty prone to issues, especially in a multithreaded context. What would happen if two threads tried to add multirects simultaneously? To me it looks like they would have to flush the multirect buffer each time they added a rect (but since there is no mutex here I'm not sure that would reliably happen) which of course would defeat the purpose of the multirect.

The other piece I am worried about is that it appears that the flush doesn't happen automatically. For example, I don't see a flush at all in dynamic_font.cpp and I worry that could lead to an edge case where the multirect is never submitted.

Looking deeper I think I understand why its implemented like this. It seems you are avoiding allocating an array in user-space for each of the areas that rely on the multi-rect command. So having the shared helper makes sense. Additionally, canvas_item_add_texture_multirect_region() takes the whole multirect at once so you need to buffer the rects somewhere before calling canvas_item_add_texture_multirect_region().

It may be more efficient to just make use of the multirect class in user-space. For example, in tilemap.cpp it would be fairly trivial to buffer the rects and then just make one call to canvas_item_add_texture_multirect_region().

Alternatively, perhaps VisualServerCanvasHelper could be implemented to return an allocated-on-demand multirect so you can expose the same easy to use API. Then at the end of a frame, you could flush all the multirects and put them back in the pool for reuse. This would take more memory than the current static approach, but it should be able to use the same easy-to-use API, add warnings for unflushed uses, and avoid threading issues.

@Calinou
Copy link
Member

Calinou commented Dec 2, 2022

I benchmarked this:

OS: Fedora 36 (KDE + KWin with compositing disabled)
CPU: Intel Core i7-6700K @ 4.4 GHz
GPU: AMD Radeon RX 6900 XT

GLES2 is used (project default). A non-editor release build with LTO enabled is used for both "Before" and "After (this PR)".

I've also tested GLES3 and it showed similar performance characteristics.

Type Before After (this PR) Relative performance
120×120 window (empty) 10353 FPS 10967 FPS 1.06×
120×120 window (full of text) 1813 FPS 2747 FPS 1.52×
3840×2160 fullscreen (empty) 9501 FPS 9913 FPS 1.04×
3840×2160 fullscreen (full of text)1 93 FPS 256 FPS 2.75×

Warning
Multirect is disabled in the test project's settings, so make sure to enable it in the project settings first.

Footnotes

  1. Until text reaches the bottom of the screen, so there is much more text here (as the test project uses the disabled stretch mode).

@lawnjelly
Copy link
Member Author

Thanks for comments. Agree on the threading, I'll add the suggestions to the PR when I'm back at home next week. 👍

@lawnjelly lawnjelly requested a review from a team as a code owner December 6, 2022 13:00
@lawnjelly lawnjelly marked this pull request as draft December 6, 2022 13:08
@lawnjelly lawnjelly force-pushed the multirect branch 2 times, most recently from 7103ccd to 0f4c030 Compare December 6, 2022 16:36
@lawnjelly lawnjelly marked this pull request as ready for review December 6, 2022 16:45
@lawnjelly
Copy link
Member Author

lawnjelly commented Dec 6, 2022

Stack friendly fixed size MultiRect

After trying an approach using a thread safe pool of MultiRects for fonts etc, I decided I preferred the approach of avoiding the mutex and pool by just using fixed max size MultiRect (helper class) and allowing allocating on the stack, which is a lot cheaper than the dynamic allocations for both the MultiRect itself and the rect and sources lists.

The current size on the stack would be approx:
(Rect (16 bytes) + Source (16 bytes) ) * max_rects (2048) = 64Kb + a little for state.

This should be as most modern stacks are 1Mb, (or 512Kb to be safe), and the MultiRects are unlikely to be used in a recursive fashion like this.

So this new version does most of the filling of MultiRects using the MultiRect helper class, which is fixed in size.

The exception is tilemaps:

Tilemaps

I've slightly altered the mechanism for tilemaps to make it a little more optimal. Instead of using the same general MultiRect cache, it uses a single set of caches (and a single mutex so that only one tilemap quadrant can be filled at a time). However the difference is that the tilemap MultiRect caches can be filled out of order. I.e. you can fill cache 0, start cache 1, then return to filling 0 if a similar set of rects is detected in the quadrant (and there is no overlap via an overlap test).

The overlap test is a slight expense, but this will tend to be done as a one off rather than every frame as in the batching, so is a lot more efficient. And if a quadrant contains a lot of changes between swaps and transposes etc, then these will now be efficiently transcribed into a small number of MultiRects.

The MultiRect system with add_char_ex() currently doesn't act for user created nodes that use add_char(), mainly to keep backward compatibility and keep the API the same. This shouldn't be a big problem as it is reasonably fast even if MultiRect is not used. Although we might consider changing the API here if we roll out to 4.x, as there is no batching in Vulkan, so the potential benefits will be greater.

Other Canvas Node types

Although @bruvzg suggested there may be a few more canvas types using add_char(), it turns out most use the FontDrawer helper class. As FontDrawer has been converted to use MultiRect, all the canvas classes that use it get MultiRect functionality for free.

This includes:

  • ItemList
  • Label
  • LineEdit
  • RichTextLabel
  • TextEdit

@lawnjelly lawnjelly force-pushed the multirect branch 2 times, most recently from 68459b5 to 1c23988 Compare December 21, 2022 12:16
@lawnjelly lawnjelly marked this pull request as draft December 21, 2022 12:30
@lawnjelly lawnjelly force-pushed the multirect branch 4 times, most recently from 31d4995 to 0cbfffd Compare December 21, 2022 15:28
Large groups of similar rects can be processed more efficiently using the MultiRect command. Processing common to the group can be done as a one off, instead of per rect.

Adds the new API to VisualServerCanvas, and uses the new functionality from Font, BitmapFont, DynamicFont and TileMap, via the VisualServerCanvasHelper class.
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I like this new approach to the canvas helper a lot more.

Tested locally and can confirm that there is a pretty nice speedup with a lot of text in a scene (I tested with a few thousand labels)

@akien-mga akien-mga merged commit 4c5a934 into godotengine:3.x Apr 17, 2023
@akien-mga
Copy link
Member

Thanks!

@lawnjelly lawnjelly deleted the multirect branch April 17, 2023 15:48
@akien-mga akien-mga changed the title Batching - Add MultiRect command [3.x] Batching - Add MultiRect command Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants