Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add renderer low latency mode #100031

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KeyboardDanni
Copy link
Contributor

Supersedes #94898

More or less implements godotengine/godot-proposals#11200 . Instead of using frame_queue_size = 1 though, this PR waits on the GPU using _stall_for_previous_frames() for RenderingDevice, and glFinish() for the Compatibility renderer. This means the swapchain and frame queue don't need to be recreated, which has two advantages:

  1. Low latency mode can be enabled or disabled during runtime, meaning it can be easily added to the in-game settings menu.
  2. Pipelining can be enabled/disabled on a per-frame basis.

The renderer latency mode can be adjusted via the rendering/latency/low_latency_mode project setting, as well as via Engine::set_render_latency_mode() and the --renderer-latency commandline argument. The two options are:

  • RENDER_LATENCY_PRIORITIZE_FRAMERATE / --renderer-latency high_framerate - Tells the renderer to prioritize higher framerate by allowing the CPU to queue up additional frames before they're rendered by the GPU. This allows the CPU and GPU to work in tandem, improving the framerate and framepacing in complex scenes at the expense of input latency.
  • RENDER_LATENCY_PRIORITIZE_LOW_LATENCY / --renderer-latency low_latency - Tells the renderer to prioritize lower display latency by limiting how far the CPU is allowed to go ahead of the GPU when queueing frames. This can greatly help with input lag, at the cost of significantly reduced framerate in complex scenes.

The enum names were chosen deliberately to highlight that a tradeoff is being made between high framerate and low latency. The documentation should also hopefully have ample warning so developers at least know what they're getting into with this setting.

In the future I'd like some sort of dynamic/automatic option which would switch pipelining on/off based on whether it would improve the framerate, inspired by the auto_fps_auto_pipeline Swappy mode available on Android. This would theoretically combine the benefits of both modes, with high performance under load and low latency where available.

The automatic approach has its disadvantages though, and I'd rather do more investigation before introducing an option that tries to be "smart" but ends up causing more problems. At the very least, it'd need to be aware of frame timings on both the CPU and GPU, to determine whether they would fit within one V-Sync period or if pipelining would be needed. AMD and nVidia's low latency SDKs/extensions might be more effective here than trying to come up with our own solution, but there's availability/licensing issues to contend with.

Right now this PR is mainly to test the waters and see if this would be satisfactory for a "low latency mode", or if more work is needed. This is just one of several improvements that need to be made to achieve the ultimate goal of 1 frame without pipelining, or 2-3 frames with pipelining. This PR just lets the user control whether pipelining is enabled or not. Eliminating unnecessary swapchain waits before drawing, and ensuring each renderer uses direct scanout (DXGI on Windows), are the two other main improvements to be made.

With this change, RenderingDevice saves 1 frame (at default settings) and OpenGL can save up to 2 frames (depending on drivers etc, YMMV). That means it's possible to achieve 1 frame of display latency in OpenGL, and 2 frames in Vulkan/D3D12, though at the cost of significantly reduced framerate in complex scenes. But if the game in question is a 2D game with simple graphics, there is very little risk from enabling this mode.

@Capewearer
Copy link

Sounds very great, could you explain how it would interact with double/triple buffering? Because it feels like turning off/on this exact feature.

@KeyboardDanni
Copy link
Contributor Author

By double/triple buffering, do you mean the frame queue, or the swapchain? When enabled, this effectively forces a single-buffered frame queue, as if you set frame_queue_size to 1, but without actually using a smaller frame queue (size remains the same, just doesn't get filled). Right now there's no way to set frame_queue_size to 1 without modifying the engine, so with this change you can bring the latency one frame lower than you could otherwise.

The swapchain is unaffected, so if you want the lowest latency with standard (FIFO) V-Sync you'll need to set swapchain_image_count to 2.

@Capewearer
Copy link

By double/triple buffering, do you mean the frame queue, or the swapchain? When enabled, this effectively forces a single-buffered frame queue, as if you set frame_queue_size to 1, but without actually using a smaller frame queue (size remains the same, just doesn't get filled). Right now there's no way to set frame_queue_size to 1 without modifying the engine, so with this change you can bring the latency one frame lower than you could otherwise.

The swapchain is unaffected, so if you want the lowest latency with standard (FIFO) V-Sync you'll need to set swapchain_image_count to 2.

Yes, that's what I really asked. Thanks for explanation!

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally (rebased on top of master 36d90c7), it works as expected.

Benchmark

PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 41)

Latency

With both Vulkan and OpenGL, you avoid between 1 and 2 frames of latency by using --renderer-latency low_latency. On a 60 Hz display, this means a reduction of input lag between 16.7 and 33.3 ms, which is already noticeable when playing with a controller, but very significant when playing with a keyboard and mouse.

Quoting from KeyboardDanni/godot-latency-tester#1:

Default settings

Vulkan, vsync enabled, frame_queue_size = 2, swapchain_image_count = 3:

  • compositing enabled, windowed, no present wait = 5 frames
  • compositing disabled, windowed, no present wait = 4 frames
  • compositing disabled, fullscreen, no present wait = 4 frames

OpenGL:

  • compositing enabled, windowed, no present wait = 4 frames
  • compositing disabled, windowed, no present wait = 2 frames
  • compositing disabled, fullscreen, no present wait = 2 frames

With --renderer-latency low_latency command line argument

Vulkan, vsync enabled:

  • compositing enabled, windowed, no present wait = 4 frames
  • compositing disabled, windowed, no present wait = 3 frames
  • compositing disabled, fullscreen, no present wait = 3 frames

OpenGL:

  • compositing enabled, windowed, no present wait = 2-3 frames
  • compositing disabled, windowed, no present wait = 1 frame
  • compositing disabled, fullscreen, no present wait = 1 frame

Waitable swapchains could bring further improvements that stack with this PR's functionality, but I didn't investigate them yet:

Performance

Using the 3D Platformer demo with an optimized editor binary. X11 is used with compositing disabled.

Latency high_framerate low_latency
Vulkan 1152×648 window 1636 FPS (0.61 mspf) 941 FPS (1.06 mspf)
Vulkan 3840×2160 fullscreen 769 FPS (1.30 mspf) 357 FPS (2.72 mspf)
OpenGL 1152×648 window 2094 FPS (0.47 mspf) 881 FPS (1.13 mspf)
OpenGL 3840×2160 fullscreen 1208 FPS (0.82 mspf) 813 FPS (1.23 mspf)

While these reductions sound significant, the framerates with --renderer-latency low_latency are still plenty to satisfy the refresh rates of 240 Hz or even 360 Hz displays. Pair this with a framerate cap (--max-fps) and you'll get a consistent low-latency experience.

In a more demanding project like the TPS demo, using --renderer-latency low_latency will reduce average framerate from 105 FPS to 53 FPS in 3840×2160. For graphics-heavy games, the framerate reduction is usually too severe to enable this feature.

Feedback

  • Should we use low-latency mode in the editor automatically? I think we can make it always be used in the 2D editor, script editor and AssetLib tabs. In the 3D editor, it would be enabled whenever the 3D editor viewport doesn't have focus (i.e. the mouse cursor isn't on the 3D editor viewport). This would avoid skewing performance measurements with the View Frame Time panel.
    • There could be an editor setting to override this behavior, with the values Auto (default), Never, Always.

@KeyboardDanni
Copy link
Contributor Author

Feedback

  • Should we use low-latency mode in the editor automatically? I think we can make it always be used in the 2D editor, script editor and AssetLib tabs. In the 3D editor, it would be enabled whenever the 3D editor viewport doesn't have focus (i.e. the mouse cursor isn't on the 3D editor viewport). This would avoid skewing performance measurements with the View Frame Time panel.

    • There could be an editor setting to override this behavior, with the values Auto (default), Never, Always.

I don't see much use case for having this in the editor, unless there's some weird crossover caused by the new game embedding feature.

Worth noting that the low latency gains here diminish significantly once you go beyond 120hz - this mode is primarily intended for 2D or lightweight 3D games running on 60hz displays. Also worth noting that nVidia Reflex and AMD Anti-Lag have driver-level knowledge that could minimize the FPS impact (since they're aware of the current CPU/GPU timings and can optimize for that), but this PR is at least a start.

@Calinou
Copy link
Member

Calinou commented Feb 8, 2025

I don't see much use case for having this in the editor, unless there's some weird crossover caused by the new game embedding feature.

We have a lot of issues related to input lag in the editor, such as #71795.

Disabling V-Sync avoids these issues, but it introduces tearing (which is mostly an issue for the 2D/3D viewports – probably not much of a problem in the script editor where scrolling is mostly vertical). Also, regardless of V-sync, we can get further improvements to latency by reducing frame queuing.

In general, non-game applications are the area where I'd expect the low-latency mode to be most useful – and the Godot editor is a non-game application made with Godot 🙂

@KeyboardDanni
Copy link
Contributor Author

Discussed with @Calinou and we'll probably handle low latency in the editor in a followup PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants