-
-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batching to RendererCanvasRenderRD
#92797
Add batching to RendererCanvasRenderRD
#92797
Conversation
GCC warning to fix:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, it works as expected.
Testing project: test_2d_batching_1x1_sprite.zip
This project renders 10,000 Sprite2D nodes that are 1x1 each at a fixed position. This creates a stress point for the CPU while not using the GPU much (as each sprite is tiny).
Benchmark
Using a Linux optimized editor build (optimize=speed lto=full
).
PC specifications
- CPU: Intel Core i9-13900K
- GPU: NVIDIA GeForce RTX 4090
- RAM: 64 GB (2×32 GB DDR5-5800 C30)
- SSD: Solidigm P44 Pro 2 TB
- OS: Linux (Fedora 40)
Results are in 3840x2160 fullscreen, but results in 1152x648 are pretty much identical since this is a CPU-bottlenecked scenario.
Before | After (this PR, old implementation) | After (this PR, new implementation) | Compatibility1 |
---|---|---|---|
935 FPS (1.07 mspf) | 2145 FPS (0.47 mspf) | 1632 FPS (0.62 mspf) | 1563 FPS (0.64 mspf) |
Software rendering with llvmpipe (1152x648):
Before | After (this PR, old implementation) | After (this PR, new implementation) | Compatibility1 |
---|---|---|---|
27 FPS (37.04 mspf) | 183 FPS (5.46 mspf) | 185 FPS (5.41 mspf) | N/A |
Footnotes
3d398ff
to
4ff6d08
Compare
Fixed! Strange, as that code is a duplicate of the gles3, so they must have that warning disabled?: godot/drivers/gles3/rasterizer_canvas_gles3.cpp Line 2589 in 96a386f
|
It isn't fully, the forward code uses a |
4ff6d08
to
7d09361
Compare
7d09361
to
3b33320
Compare
a255385
to
dfd683c
Compare
dfd683c
to
d4369f9
Compare
@@ -2311,6 +2311,9 @@ | |||
[b]Note:[/b] This property is only read when the project starts. To change the physics FPS at runtime, set [member Engine.physics_ticks_per_second] instead. | |||
[b]Note:[/b] Only [member physics/common/max_physics_steps_per_frame] physics ticks may be simulated per rendered frame at most. If more physics ticks have to be simulated per rendered frame to keep up with rendering, the project will appear to slow down (even if [code]delta[/code] is used consistently in physics calculations). Therefore, it is recommended to also increase [member physics/common/max_physics_steps_per_frame] if increasing [member physics/common/physics_ticks_per_second] significantly above its default value. | |||
</member> | |||
<member name="rendering/2d/batching/item_buffer_size" type="int" setter="" getter="" default="16384"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The equivalent for this in the compatibility renderer is rendering/gl_compatibility/item_buffer_size
. I wish I had had the foresight to make the setting something backend-agnostic like the name you use here.
After we merge this PR, we should deprecate rendering/gl_compatibility/item_buffer_size
and use this setting for both the RD backend and the compatibility backend. I think it should be possible to stop writing rendering/gl_compatibility/item_buffer_size
and to just read it if present, then delete it.
866c80b
to
ce80320
Compare
cdd67ee
to
0df383b
Compare
Important I've pushed an alternative version to this branch that is more efficient and reduces branching in the CPU side. Further, it uses The previous version is still available via sgc/canvas_batching_prev |
Rebased so the build can run; I'll remove the fallback code and squash the commits next. |
2bc0dae
to
f4232ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested after the rebase and it looks and runs great!
Tested briefly on Linux (AMD Radeon RX 7600M XT), with unoptimized Simple 2D scene, editor FPS with Update Continuously and VSync off:
MRP from #85871, zoomed out to around 7%, with grid enabled, and using
|
f4232ca
to
a657ea4
Compare
Amazing work once again 👏 🎉 |
I'm seeing a regression in font subpixel antialiasing that seems to have originated from this pull request, where I get these weird black artifacts around a lot of the text in the editor. I opened an issue for it here: #97195 |
I'm afraid this PR broke mobile support because it uses slots beyond the minimum guaranteed maxBoundDescriptorSets = 4 by Vulkan spec. I create a ticket for that issue here: #97341 |
@stuartcarnie @clayjohn Awesome work!, any chance this batching could be extended to the other drawing functions ? ; draw_texture(), draw_rect(), draw_line() are batched!, however drawing functions like draw_arc() draw_circle(), draw_polygon(), etc... seem to make a separate draw call each time they are drawn (tested on 4.4 dev6). I think having them batched might attract people that look to game frameworks (like raylib, sdl2) for more performance with basic shapes.. |
It's possible. We've discussed an approach to batching other primitives and I even wrote up some notes on how it could work, but no one has been interested in implementing it. So feel free to take a stab at it if you want! |
This PR adds batching support for
RendererCanvasRenderRD
. It borrows much of the implementation from the OpenGL compatibility renderer, with additional optimisations to reduce branching and extra work in the hot loops.Issue #85871 was a particularly interesting case, as zoomed to about 9% on my MacBook Pro M1: