Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D Batching is extremely slow on Android (GLES3) #53537

Closed
Poobslag opened this issue Oct 7, 2021 · 9 comments
Closed

2D Batching is extremely slow on Android (GLES3) #53537

Poobslag opened this issue Oct 7, 2021 · 9 comments

Comments

@Poobslag
Copy link

Poobslag commented Oct 7, 2021

Godot version

v3.3.4.stable.official [faf3f88]

System information

Galaxy Note9 REV1.1, Android 10, GLES3

Issue description

2D batching causes a severe performance degradation on Android. With 2D batching disabled, the attached demo can render 1,920 sprite nodes at 60 FPS. With 2D batching enabled, it runs at 3 FPS. This is on my Samsung Galaxy Note9 (SM-N960U) running Android 10.

2D batching enabled:

Screenshot_20211007-112230_polygon2d_performance

2D batching disabled:
Screenshot_20211007-112131_polygon2d_performance

Steps to reproduce

  1. Deploy the attached demo on a Samsung Galaxy Note9 or similar android device. (You will need to configure an export preset and a debug keystore.) 2D batching is enabled by default.
  2. Click the "x2" button to increase the number of sprite nodes from 480 to 1920. With 2D batching enabled, the FPS drops to 3.
  3. Edit the Project Settings, disabling 2D Batching (Project -> Project Settings, Rendering -> Batching -> Use Batching = Off)
  4. Deploy the demo a second time. Click the "x2" button to increase the number of sprite nodes. The FPS stays at 60.

Minimal reproduction project

polygon2d_performance.zip

@Calinou
Copy link
Member

Calinou commented Oct 7, 2021

I wonder if this is related to 2D lighting. Can you reproduce this if you hide all lights in the scene? Also try reproducing this with only one light visible in the scene.

@Calinou Calinou added this to the 3.4 milestone Oct 7, 2021
@Poobslag
Copy link
Author

Poobslag commented Oct 7, 2021

With 4 lights visible, the demo runs at 3 FPS.
With 1 light visible, the demo runs at 7 FPS.
With 0 lights visible, the demo runs at 60 FPS.

The lights are definitely influencing the degradation.

@Calinou Calinou changed the title 2D Batching is extremely slow on Android 2D Batching is extremely slow on Android (GLES3) Oct 7, 2021
@lawnjelly
Copy link
Member

lawnjelly commented Oct 7, 2021

Ah thanks for posting (from reddit)!

Yes it will be the lights. Lights are very difficult to deal with in terms of batching, and sometimes there are 'worst case scenarios' in which there is no way to batch the drawcalls without affecting the end result, so none of the commands are batched.

Sometimes tricks are needed with lights, as you can easily end up in situations where merged sprites end up using more fill rate than individual sprites (and mobile is very sensitive to fill rate), especially with large lights, which is what you are using. This is the reason for the rendering/batching/lights/scissor_area_threshold setting.

There's some info on this in the docs:
https://docs.godotengine.org/en/stable/tutorials/optimization/batching.html#lights

There's a number of other parameters that may help your situation too, I'll see if I can work out what is going on from the source and diagnose frame. And there likely are some situations which are so extreme with lighting and overlaps that batching can't help with them (some situations there are literally no way to batch without changing the end result).

Yes because of the long chains of overlapping sprites you are literally getting no batching going on above the x1 setting. Here's a diagnose log:

canvas_begin FRAME 399
items
	joined_item 1 refs, 
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
	joined_item 1 refs, 
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
	joined_item 1 refs, 
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
	joined_item 1 refs, 
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
	joined_item 1 refs, 
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
	joined_item 1 refs, 
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }
			batch R 0-1 [0 - 155] {255 255 255 255 }

If you turn on the batching diagnose_frame option you can see the log for your game project.

Now as to why it is that much slower when it can't batch, I'm not absolutely sure, but it could be that part of the explanation is that rendering/batching/options/single_rect_fallback defaults to off for batching, for safety, which is slower, whereas the legacy renderer does not (this is often 2x as slow, but is usually made up for with other gains).

Yes, just tested the batching is quite a bit faster with the single_rect_fallback turned off, up from 5 - 12 fps for me (but still not matching the 80% difference you saw on android), but this does match the 12 fps I get in legacy on my machine.

You are probably also losing some performance because it is attempting to deal with the overlapping sprites and failing. You can also turn this off in the batching settings, but none of this will get you better performance than legacy if no batching at all is possible.

What is very telling is that if I use the legacy renderer, and switch on rendering/2d/options/use_nvidia_rect_flicker_workaround, the fps drops to 4fps for me, which is less than with batching, which is also suggestive that this is responsible for the bulk of the difference.

For android you can probably get by with this off in either case (i.e. workaround off for legacy, or single_rect_fallback on for batching) because the flicker problem has mainly been seen on nvidia desktops I believe.

@Poobslag
Copy link
Author

Poobslag commented Oct 7, 2021

Thanks -- I see!

I currently have a game with about 50 overlapping sprites (each character is made up of tons of parts for their mouth, eyes, ears, etc) and a polygon (their torso is a big oval which can change shape.) The unfortunate reality of this design, is that I must have batching on to work around issue #19943, but I must have batching off to work around issue #53537.

I think a feasible workaround is the "create a Sprite which snapshots a polygon" approach, so that's what I'm trying for now. But ideally it would be nice if the fix for #19943 did not require sprite batching, or if a fix for #53537 allowed sprite batching to work efficiently on Android.

@lawnjelly
Copy link
Member

lawnjelly commented Oct 7, 2021

Well in fact, if you follow the advice above you should be able to cure most of the performance issues with batching on. I'll try and find the best settings for you, but for now, try switching single_rect_fallback to on, and see how that affects things, also give a try to reducing scissor_area_threshold to 0.1 or so that may well help too.

Yes the exact best parameters to use will depend on your game, but the three most likely to be important here are:
single_rect_fallback
scissor_area_threshold (may or may not help, depends on your lights etc)
lights/max_join_items

@Poobslag
Copy link
Author

Poobslag commented Oct 7, 2021

Thanks for the suggestions! I've done some experimentation.

With batching disabled, the demo runs at 60 FPS.
With batching enabled and single_rect_fallback enabled, the demo runs at 50-60 FPS.
With batching enabled and single_rect_fallback enabled and a scissor_area_threshold of 0.1, the demo runs at 50-60 FPS.

I see similar behavior in my game -- the single_rect_fallback improves batching performance, but it still runs faster with batching disabled. You're right though, the single_rect_fallback improves batching performance a lot.

@lawnjelly
Copy link
Member

Also a couple of other options:

  1. If you can make your lights smaller, it may be able to run faster (even possibly with more lights but smaller).
  2. You may be able to take an approach like this with lights:

https://www.youtube.com/watch?v=P6qzGofbNyo

Although it will probably be difficult to get it looking as good (and won't work if you are e.g. using normal maps), this will run shedloads faster in your situation. The main problem I seem to remember was the lack of bit depth when I tried it.

@lawnjelly lawnjelly removed the bug label Oct 7, 2021
@Poobslag
Copy link
Author

Poobslag commented Oct 7, 2021

Thanks for the suggestions! My game doesn't use any lights -- it is doing something else which impacts the batching algorithm, but I'm not sure what exactly. I'll continue looking into it over the next few weeks and see what I can find.

@lawnjelly
Copy link
Member

Closing as lighting performance is unlikely to be fixed, being a limitation of the 2D rendering paradigm.

@akien-mga akien-mga removed this from the 3.x milestone Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants