Add renderer low latency mode #100031

KeyboardDanni · 2024-12-05T02:00:04Z

Supersedes #94898

More or less implements godotengine/godot-proposals#11200 . Instead of using frame_queue_size = 1 though, this PR waits on the GPU using _stall_for_previous_frames() for RenderingDevice, and glFinish() for the Compatibility renderer. This means the swapchain and frame queue don't need to be recreated, which has two advantages:

Low latency mode can be enabled or disabled during runtime, meaning it can be easily added to the in-game settings menu.
Pipelining can be enabled/disabled on a per-frame basis.

The renderer latency mode can be adjusted via the rendering/latency/low_latency_mode project setting, as well as via Engine::set_render_latency_mode() and the --renderer-latency commandline argument. The two options are:

RENDER_LATENCY_PRIORITIZE_FRAMERATE / --renderer-latency high_framerate - Tells the renderer to prioritize higher framerate by allowing the CPU to queue up additional frames before they're rendered by the GPU. This allows the CPU and GPU to work in tandem, improving the framerate and framepacing in complex scenes at the expense of input latency.
RENDER_LATENCY_PRIORITIZE_LOW_LATENCY / --renderer-latency low_latency - Tells the renderer to prioritize lower display latency by limiting how far the CPU is allowed to go ahead of the GPU when queueing frames. This can greatly help with input lag, at the cost of significantly reduced framerate in complex scenes.

The enum names were chosen deliberately to highlight that a tradeoff is being made between high framerate and low latency. The documentation should also hopefully have ample warning so developers at least know what they're getting into with this setting.

In the future I'd like some sort of dynamic/automatic option which would switch pipelining on/off based on whether it would improve the framerate, inspired by the auto_fps_auto_pipeline Swappy mode available on Android. This would theoretically combine the benefits of both modes, with high performance under load and low latency where available.

The automatic approach has its disadvantages though, and I'd rather do more investigation before introducing an option that tries to be "smart" but ends up causing more problems. At the very least, it'd need to be aware of frame timings on both the CPU and GPU, to determine whether they would fit within one V-Sync period or if pipelining would be needed. AMD and nVidia's low latency SDKs/extensions might be more effective here than trying to come up with our own solution, but there's availability/licensing issues to contend with.

Right now this PR is mainly to test the waters and see if this would be satisfactory for a "low latency mode", or if more work is needed. This is just one of several improvements that need to be made to achieve the ultimate goal of 1 frame without pipelining, or 2-3 frames with pipelining. This PR just lets the user control whether pipelining is enabled or not. Eliminating unnecessary swapchain waits before drawing, and ensuring each renderer uses direct scanout (DXGI on Windows), are the two other main improvements to be made.

With this change, RenderingDevice saves 1 frame (at default settings) and OpenGL can save up to 2 frames (depending on drivers etc, YMMV). That means it's possible to achieve 1 frame of display latency in OpenGL, and 2 frames in Vulkan/D3D12, though at the cost of significantly reduced framerate in complex scenes. But if the game in question is a 2D game with simple graphics, there is very little risk from enabling this mode.

Capewearer · 2024-12-07T00:14:57Z

Sounds very great, could you explain how it would interact with double/triple buffering? Because it feels like turning off/on this exact feature.

KeyboardDanni · 2024-12-07T00:26:10Z

By double/triple buffering, do you mean the frame queue, or the swapchain? When enabled, this effectively forces a single-buffered frame queue, as if you set frame_queue_size to 1, but without actually using a smaller frame queue (size remains the same, just doesn't get filled). Right now there's no way to set frame_queue_size to 1 without modifying the engine, so with this change you can bring the latency one frame lower than you could otherwise.

The swapchain is unaffected, so if you want the lowest latency with standard (FIFO) V-Sync you'll need to set swapchain_image_count to 2.

Capewearer · 2024-12-07T00:34:06Z

By double/triple buffering, do you mean the frame queue, or the swapchain? When enabled, this effectively forces a single-buffered frame queue, as if you set frame_queue_size to 1, but without actually using a smaller frame queue (size remains the same, just doesn't get filled). Right now there's no way to set frame_queue_size to 1 without modifying the engine, so with this change you can bring the latency one frame lower than you could otherwise.

The swapchain is unaffected, so if you want the lowest latency with standard (FIFO) V-Sync you'll need to set swapchain_image_count to 2.

Yes, that's what I really asked. Thanks for explanation!

Calinou

Tested locally (rebased on top of master 36d90c7), it works as expected.

Benchmark

PC specifications

CPU: Intel Core i9-13900K
GPU: NVIDIA GeForce RTX 4090
RAM: 64 GB (2×32 GB DDR5-5800 C30)
SSD: Solidigm P44 Pro 2 TB
OS: Linux (Fedora 41)

Latency

With both Vulkan and OpenGL, you avoid between 1 and 2 frames of latency by using --renderer-latency low_latency. On a 60 Hz display, this means a reduction of input lag between 16.7 and 33.3 ms, which is already noticeable when playing with a controller, but very significant when playing with a keyboard and mouse.

Quoting from KeyboardDanni/godot-latency-tester#1:

Default settings

Vulkan, vsync enabled, frame_queue_size = 2, swapchain_image_count = 3:

compositing enabled, windowed, no present wait = 5 frames

compositing disabled, windowed, no present wait = 4 frames

compositing disabled, fullscreen, no present wait = 4 frames

OpenGL:

compositing enabled, windowed, no present wait = 4 frames

compositing disabled, windowed, no present wait = 2 frames

compositing disabled, fullscreen, no present wait = 2 frames

With --renderer-latency low_latency command line argument

Vulkan, vsync enabled:

compositing enabled, windowed, no present wait = 4 frames

compositing disabled, windowed, no present wait = 3 frames

compositing disabled, fullscreen, no present wait = 3 frames

OpenGL:

compositing enabled, windowed, no present wait = 2-3 frames

compositing disabled, windowed, no present wait = 1 frame

compositing disabled, fullscreen, no present wait = 1 frame

Waitable swapchains could bring further improvements that stack with this PR's functionality, but I didn't investigate them yet:

Performance

Using the 3D Platformer demo with an optimized editor binary. X11 is used with compositing disabled.

Latency	`high_framerate`	`low_latency`
Vulkan 1152×648 window	1636 FPS (0.61 mspf)	941 FPS (1.06 mspf)
Vulkan 3840×2160 fullscreen	769 FPS (1.30 mspf)	357 FPS (2.72 mspf)
OpenGL 1152×648 window	2094 FPS (0.47 mspf)	881 FPS (1.13 mspf)
OpenGL 3840×2160 fullscreen	1208 FPS (0.82 mspf)	813 FPS (1.23 mspf)

While these reductions sound significant, the framerates with --renderer-latency low_latency are still plenty to satisfy the refresh rates of 240 Hz or even 360 Hz displays. Pair this with a framerate cap (--max-fps) and you'll get a consistent low-latency experience.

In a more demanding project like the TPS demo, using --renderer-latency low_latency will reduce average framerate from 105 FPS to 53 FPS in 3840×2160. For graphics-heavy games, the framerate reduction is usually too severe to enable this feature.

Feedback

Should we use low-latency mode in the editor automatically? I think we can make it always be used in the 2D editor, script editor and AssetLib tabs. In the 3D editor, it would be enabled whenever the 3D editor viewport doesn't have focus (i.e. the mouse cursor isn't on the 3D editor viewport). This would avoid skewing performance measurements with the View Frame Time panel.
- There could be an editor setting to override this behavior, with the values Auto (default), Never, Always.

main/main.cpp

KeyboardDanni · 2025-02-08T19:18:51Z

Feedback

Should we use low-latency mode in the editor automatically? I think we can make it always be used in the 2D editor, script editor and AssetLib tabs. In the 3D editor, it would be enabled whenever the 3D editor viewport doesn't have focus (i.e. the mouse cursor isn't on the 3D editor viewport). This would avoid skewing performance measurements with the View Frame Time panel.

There could be an editor setting to override this behavior, with the values Auto (default), Never, Always.

I don't see much use case for having this in the editor, unless there's some weird crossover caused by the new game embedding feature.

Worth noting that the low latency gains here diminish significantly once you go beyond 120hz - this mode is primarily intended for 2D or lightweight 3D games running on 60hz displays. Also worth noting that nVidia Reflex and AMD Anti-Lag have driver-level knowledge that could minimize the FPS impact (since they're aware of the current CPU/GPU timings and can optimize for that), but this PR is at least a start.

Calinou · 2025-02-08T19:51:51Z

I don't see much use case for having this in the editor, unless there's some weird crossover caused by the new game embedding feature.

We have a lot of issues related to input lag in the editor, such as #71795.

Disabling V-Sync avoids these issues, but it introduces tearing (which is mostly an issue for the 2D/3D viewports – probably not much of a problem in the script editor where scrolling is mostly vertical). Also, regardless of V-sync, we can get further improvements to latency by reducing frame queuing.

In general, non-game applications are the area where I'd expect the low-latency mode to be most useful – and the Godot editor is a non-game application made with Godot 🙂

KeyboardDanni · 2025-02-08T22:02:47Z

Discussed with @Calinou and we'll probably handle low latency in the editor in a followup PR.

KeyboardDanni requested review from a team as code owners December 5, 2024 02:00

KeyboardDanni mentioned this pull request Dec 5, 2024

Allow frame_queue_size=1 for reduced input lag #94898

Closed

Mickeon added enhancement topic:rendering topic:core labels Dec 5, 2024

Mickeon added this to the 4.x milestone Dec 5, 2024

Calinou added the performance label Dec 5, 2024

Calinou reviewed Feb 8, 2025

View reviewed changes

main/main.cpp Outdated Show resolved Hide resolved

Add renderer low latency mode

d2593ba

KeyboardDanni force-pushed the low-latency-mode branch from 1b981ff to d2593ba Compare February 8, 2025 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add renderer low latency mode #100031

Add renderer low latency mode #100031

KeyboardDanni commented Dec 5, 2024

Capewearer commented Dec 7, 2024

KeyboardDanni commented Dec 7, 2024

Capewearer commented Dec 7, 2024

Calinou left a comment •

edited

Loading

KeyboardDanni commented Feb 8, 2025

Feedback

Calinou commented Feb 8, 2025 •

edited

Loading

KeyboardDanni commented Feb 8, 2025

Add renderer low latency mode #100031

Are you sure you want to change the base?

Add renderer low latency mode #100031

Conversation

KeyboardDanni commented Dec 5, 2024

Capewearer commented Dec 7, 2024

KeyboardDanni commented Dec 7, 2024

Capewearer commented Dec 7, 2024

Calinou left a comment • edited Loading

Choose a reason for hiding this comment

Benchmark

Latency

Default settings

With --renderer-latency low_latency command line argument

Performance

Feedback

KeyboardDanni commented Feb 8, 2025

Feedback

Calinou commented Feb 8, 2025 • edited Loading

KeyboardDanni commented Feb 8, 2025

Calinou left a comment •

edited

Loading

With `--renderer-latency low_latency` command line argument

Calinou commented Feb 8, 2025 •

edited

Loading