Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batching to RendererCanvasRenderRD #92797

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

stuartcarnie
Copy link
Contributor

@stuartcarnie stuartcarnie commented Jun 5, 2024

This PR adds batching support for RendererCanvasRenderRD. It borrows much of the implementation from the OpenGL compatibility renderer, with additional optimisations to reduce branching and extra work in the hot loops.

Issue #85871 was a particularly interesting case, as zoomed to about 9% on my MacBook Pro M1:

area before after
draw calls (for tile map) > 573440 36
FPS ~10 > 130

@akien-mga
Copy link
Member

GCC warning to fix:

servers/rendering/renderer_rd/renderer_canvas_render_rd.cpp: In member function 'void RendererCanvasRenderRD::_allocate_instance_buffer()':
Error: servers/rendering/renderer_rd/renderer_canvas_render_rd.cpp:3903:47: error: comparison of integer expressions of different signedness: 'int' and 'unsigned int' [-Werror=sign-compare]
 3903 |  if (int(state.current_instance_buffer_index) < state.canvas_instance_data_buffers[state.current_data_buffer_index].instance_buffers.size()) {
      |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected.

Testing project: test_2d_batching_1x1_sprite.zip
This project renders 10,000 Sprite2D nodes that are 1x1 each at a fixed position. This creates a stress point for the CPU while not using the GPU much (as each sprite is tiny).

Benchmark

Using a Linux optimized editor build (optimize=speed lto=full).

PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 40)

Results are in 3840x2160 fullscreen, but results in 1152x648 are pretty much identical since this is a CPU-bottlenecked scenario.

Before After (this PR, old implementation) After (this PR, new implementation) Compatibility1
935 FPS (1.07 mspf) 2145 FPS (0.47 mspf) 1632 FPS (0.62 mspf) 1563 FPS (0.64 mspf)

Software rendering with llvmpipe (1152x648):

Before After (this PR, old implementation) After (this PR, new implementation) Compatibility1
27 FPS (37.04 mspf) 183 FPS (5.46 mspf) 185 FPS (5.41 mspf) N/A

Footnotes

  1. Not affected by this PR, but this is present for reference. 2

@stuartcarnie stuartcarnie force-pushed the sgc/canvas_batching branch from 3d398ff to 4ff6d08 Compare June 5, 2024 19:50
@stuartcarnie
Copy link
Contributor Author

GCC warning to fix:

servers/rendering/renderer_rd/renderer_canvas_render_rd.cpp: In member function 'void RendererCanvasRenderRD::_allocate_instance_buffer()':
Error: servers/rendering/renderer_rd/renderer_canvas_render_rd.cpp:3903:47: error: comparison of integer expressions of different signedness: 'int' and 'unsigned int' [-Werror=sign-compare]
 3903 |  if (int(state.current_instance_buffer_index) < state.canvas_instance_data_buffers[state.current_data_buffer_index].instance_buffers.size()) {
      |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixed!

Strange, as that code is a duplicate of the gles3, so they must have that warning disabled?:

if (int(state.current_instance_buffer_index) < state.canvas_instance_data_buffers[state.current_data_buffer_index].instance_buffers.size()) {

@AThousandShips
Copy link
Member

Strange, as that code is a duplicate of the gles3

It isn't fully, the forward code uses a LocalVector but gles uses Vector

@stuartcarnie stuartcarnie force-pushed the sgc/canvas_batching branch from 4ff6d08 to 7d09361 Compare June 5, 2024 21:20
@stuartcarnie stuartcarnie requested a review from a team as a code owner June 5, 2024 21:20
@stuartcarnie stuartcarnie force-pushed the sgc/canvas_batching branch from 7d09361 to 3b33320 Compare June 5, 2024 23:00
@stuartcarnie stuartcarnie force-pushed the sgc/canvas_batching branch 2 times, most recently from a255385 to dfd683c Compare June 6, 2024 20:43
@stuartcarnie stuartcarnie force-pushed the sgc/canvas_batching branch from dfd683c to d4369f9 Compare June 8, 2024 23:44
@clayjohn clayjohn modified the milestones: 4.x, 4.4 Jun 12, 2024
@@ -2311,6 +2311,9 @@
[b]Note:[/b] This property is only read when the project starts. To change the physics FPS at runtime, set [member Engine.physics_ticks_per_second] instead.
[b]Note:[/b] Only [member physics/common/max_physics_steps_per_frame] physics ticks may be simulated per rendered frame at most. If more physics ticks have to be simulated per rendered frame to keep up with rendering, the project will appear to slow down (even if [code]delta[/code] is used consistently in physics calculations). Therefore, it is recommended to also increase [member physics/common/max_physics_steps_per_frame] if increasing [member physics/common/physics_ticks_per_second] significantly above its default value.
</member>
<member name="rendering/2d/batching/item_buffer_size" type="int" setter="" getter="" default="16384">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equivalent for this in the compatibility renderer is rendering/gl_compatibility/item_buffer_size. I wish I had had the foresight to make the setting something backend-agnostic like the name you use here.

After we merge this PR, we should deprecate rendering/gl_compatibility/item_buffer_size and use this setting for both the RD backend and the compatibility backend. I think it should be possible to stop writing rendering/gl_compatibility/item_buffer_size and to just read it if present, then delete it.

@stuartcarnie
Copy link
Contributor Author

stuartcarnie commented Jul 29, 2024

Important

I've pushed an alternative version to this branch that is more efficient and reduces branching in the CPU side.

Further, it uses bitfieldExtract to reduce branching on the GPU side.

The previous version is still available via sgc/canvas_batching_prev

@stuartcarnie
Copy link
Contributor Author

Rebased so the build can run; I'll remove the fallback code and squash the commits next.

stuartcarnie added a commit to stuartcarnie/godot that referenced this pull request Sep 10, 2024
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested after the rebase and it looks and runs great!

@akien-mga
Copy link
Member

Tested briefly on Linux (AMD Radeon RX 7600M XT), with unoptimized dev_build.

Simple 2D scene, editor FPS with Update Continuously and VSync off:

  • FPS master: 460-500
  • FPS with this PR: 600-650

MRP from #85871, zoomed out to around 7%, with grid enabled, and using forward_plus:

  • FPS master: 3-4
  • FPS with this PR: 12-13
    Same project with 32% zoom:
  • FPS master: ~40
  • FPS with this PR: ~70

stuartcarnie added a commit to stuartcarnie/godot that referenced this pull request Sep 11, 2024
@akien-mga akien-mga merged commit 2c136e6 into godotengine:master Sep 11, 2024
20 checks passed
@akien-mga
Copy link
Member

Amazing work once again 👏 🎉

@mihe
Copy link
Contributor

mihe commented Sep 19, 2024

I'm seeing a regression in font subpixel antialiasing that seems to have originated from this pull request, where I get these weird black artifacts around a lot of the text in the editor.

I opened an issue for it here: #97195

@darksylinc
Copy link
Contributor

darksylinc commented Sep 22, 2024

I'm afraid this PR broke mobile support because it uses slots beyond the minimum guaranteed maxBoundDescriptorSets = 4 by Vulkan spec.

I create a ticket for that issue here: #97341

@kyle-wannacott
Copy link

kyle-wannacott commented Dec 18, 2024

@stuartcarnie @clayjohn Awesome work!, any chance this batching could be extended to the other drawing functions ? ; draw_texture(), draw_rect(), draw_line() are batched!, however drawing functions like draw_arc() draw_circle(), draw_polygon(), etc... seem to make a separate draw call each time they are drawn (tested on 4.4 dev6). I think having them batched might attract people that look to game frameworks (like raylib, sdl2) for more performance with basic shapes..

@clayjohn
Copy link
Member

@stuartcarnie @clayjohn Awesome work!, any chance this batching could be extended to the other drawing functions ? ; draw_texture(), draw_rect(), draw_line() are batched!, however drawing functions like draw_arc() draw_circle(), draw_polygon(), etc... seem to make a separate draw call each time they are drawn (tested on 4.4 dev6). I think having them batched might attract people that look to game frameworks (like raylib, sdl2) for more performance with basic shapes..

It's possible. We've discussed an approach to batching other primitives and I even wrote up some notes on how it could work, but no one has been interested in implementing it. So feel free to take a stab at it if you want!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vulkan: Editor constantly uses 1-2% CPU while doing nothing on empty project due to Vulkan lacking 2D batching