-
-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add renderer low latency mode #100031
base: master
Are you sure you want to change the base?
Add renderer low latency mode #100031
Conversation
Sounds very great, could you explain how it would interact with double/triple buffering? Because it feels like turning off/on this exact feature. |
By double/triple buffering, do you mean the frame queue, or the swapchain? When enabled, this effectively forces a single-buffered frame queue, as if you set The swapchain is unaffected, so if you want the lowest latency with standard (FIFO) V-Sync you'll need to set |
Yes, that's what I really asked. Thanks for explanation! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally (rebased on top of master
36d90c7), it works as expected.
Benchmark
PC specifications
- CPU: Intel Core i9-13900K
- GPU: NVIDIA GeForce RTX 4090
- RAM: 64 GB (2×32 GB DDR5-5800 C30)
- SSD: Solidigm P44 Pro 2 TB
- OS: Linux (Fedora 41)
Latency
With both Vulkan and OpenGL, you avoid between 1 and 2 frames of latency by using --renderer-latency low_latency
. On a 60 Hz display, this means a reduction of input lag between 16.7 and 33.3 ms, which is already noticeable when playing with a controller, but very significant when playing with a keyboard and mouse.
Quoting from KeyboardDanni/godot-latency-tester#1:
Default settings
Vulkan,
vsync enabled
,frame_queue_size = 2
,swapchain_image_count = 3
:
- compositing enabled, windowed, no present wait = 5 frames
- compositing disabled, windowed, no present wait = 4 frames
- compositing disabled, fullscreen, no present wait = 4 frames
OpenGL:
- compositing enabled, windowed, no present wait = 4 frames
- compositing disabled, windowed, no present wait = 2 frames
- compositing disabled, fullscreen, no present wait = 2 frames
With
--renderer-latency low_latency
command line argumentVulkan,
vsync enabled
:
- compositing enabled, windowed, no present wait = 4 frames
- compositing disabled, windowed, no present wait = 3 frames
- compositing disabled, fullscreen, no present wait = 3 frames
OpenGL:
- compositing enabled, windowed, no present wait = 2-3 frames
- compositing disabled, windowed, no present wait = 1 frame
- compositing disabled, fullscreen, no present wait = 1 frame
Waitable swapchains could bring further improvements that stack with this PR's functionality, but I didn't investigate them yet:
- RenderingDevice: Wait for present if supported (Vulkan Windows/X11/Wayland - needs testing) #94973
- [RFC] D3D12: Use waitable swap chain to reduce input latency #94960
Performance
Using the 3D Platformer demo with an optimized editor binary. X11 is used with compositing disabled.
Latency | high_framerate |
low_latency |
---|---|---|
Vulkan 1152×648 window | 1636 FPS (0.61 mspf) | 941 FPS (1.06 mspf) |
Vulkan 3840×2160 fullscreen | 769 FPS (1.30 mspf) | 357 FPS (2.72 mspf) |
OpenGL 1152×648 window | 2094 FPS (0.47 mspf) | 881 FPS (1.13 mspf) |
OpenGL 3840×2160 fullscreen | 1208 FPS (0.82 mspf) | 813 FPS (1.23 mspf) |
While these reductions sound significant, the framerates with --renderer-latency low_latency
are still plenty to satisfy the refresh rates of 240 Hz or even 360 Hz displays. Pair this with a framerate cap (--max-fps
) and you'll get a consistent low-latency experience.
In a more demanding project like the TPS demo, using --renderer-latency low_latency
will reduce average framerate from 105 FPS to 53 FPS in 3840×2160. For graphics-heavy games, the framerate reduction is usually too severe to enable this feature.
Feedback
- Should we use low-latency mode in the editor automatically? I think we can make it always be used in the 2D editor, script editor and AssetLib tabs. In the 3D editor, it would be enabled whenever the 3D editor viewport doesn't have focus (i.e. the mouse cursor isn't on the 3D editor viewport). This would avoid skewing performance measurements with the View Frame Time panel.
- There could be an editor setting to override this behavior, with the values Auto (default), Never, Always.
I don't see much use case for having this in the editor, unless there's some weird crossover caused by the new game embedding feature. Worth noting that the low latency gains here diminish significantly once you go beyond 120hz - this mode is primarily intended for 2D or lightweight 3D games running on 60hz displays. Also worth noting that nVidia Reflex and AMD Anti-Lag have driver-level knowledge that could minimize the FPS impact (since they're aware of the current CPU/GPU timings and can optimize for that), but this PR is at least a start. |
We have a lot of issues related to input lag in the editor, such as #71795. Disabling V-Sync avoids these issues, but it introduces tearing (which is mostly an issue for the 2D/3D viewports – probably not much of a problem in the script editor where scrolling is mostly vertical). Also, regardless of V-sync, we can get further improvements to latency by reducing frame queuing. In general, non-game applications are the area where I'd expect the low-latency mode to be most useful – and the Godot editor is a non-game application made with Godot 🙂 |
1b981ff
to
d2593ba
Compare
Discussed with @Calinou and we'll probably handle low latency in the editor in a followup PR. |
Supersedes #94898
More or less implements godotengine/godot-proposals#11200 . Instead of using
frame_queue_size = 1
though, this PR waits on the GPU using_stall_for_previous_frames()
for RenderingDevice, andglFinish()
for the Compatibility renderer. This means the swapchain and frame queue don't need to be recreated, which has two advantages:The renderer latency mode can be adjusted via the
rendering/latency/low_latency_mode
project setting, as well as viaEngine::set_render_latency_mode()
and the--renderer-latency
commandline argument. The two options are:RENDER_LATENCY_PRIORITIZE_FRAMERATE
/--renderer-latency high_framerate
- Tells the renderer to prioritize higher framerate by allowing the CPU to queue up additional frames before they're rendered by the GPU. This allows the CPU and GPU to work in tandem, improving the framerate and framepacing in complex scenes at the expense of input latency.RENDER_LATENCY_PRIORITIZE_LOW_LATENCY
/--renderer-latency low_latency
- Tells the renderer to prioritize lower display latency by limiting how far the CPU is allowed to go ahead of the GPU when queueing frames. This can greatly help with input lag, at the cost of significantly reduced framerate in complex scenes.The enum names were chosen deliberately to highlight that a tradeoff is being made between high framerate and low latency. The documentation should also hopefully have ample warning so developers at least know what they're getting into with this setting.
In the future I'd like some sort of dynamic/automatic option which would switch pipelining on/off based on whether it would improve the framerate, inspired by the
auto_fps_auto_pipeline
Swappy mode available on Android. This would theoretically combine the benefits of both modes, with high performance under load and low latency where available.The automatic approach has its disadvantages though, and I'd rather do more investigation before introducing an option that tries to be "smart" but ends up causing more problems. At the very least, it'd need to be aware of frame timings on both the CPU and GPU, to determine whether they would fit within one V-Sync period or if pipelining would be needed. AMD and nVidia's low latency SDKs/extensions might be more effective here than trying to come up with our own solution, but there's availability/licensing issues to contend with.
Right now this PR is mainly to test the waters and see if this would be satisfactory for a "low latency mode", or if more work is needed. This is just one of several improvements that need to be made to achieve the ultimate goal of 1 frame without pipelining, or 2-3 frames with pipelining. This PR just lets the user control whether pipelining is enabled or not. Eliminating unnecessary swapchain waits before drawing, and ensuring each renderer uses direct scanout (DXGI on Windows), are the two other main improvements to be made.
With this change, RenderingDevice saves 1 frame (at default settings) and OpenGL can save up to 2 frames (depending on drivers etc, YMMV). That means it's possible to achieve 1 frame of display latency in OpenGL, and 2 frames in Vulkan/D3D12, though at the cost of significantly reduced framerate in complex scenes. But if the game in question is a 2D game with simple graphics, there is very little risk from enabling this mode.