Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump MAX_FEC_BLOCKS to 4 #2787

Closed
wants to merge 1 commit into from
Closed

Conversation

CypherGrue
Copy link

@CypherGrue CypherGrue commented Jul 1, 2024

Description

Sunshine implementation arbitrarily limits itself to 3 FEC blocks even though the protocol supports 4. This is fine for typical use cases. However, when dealing with large payloads (low FPS, high bitrate), there is a risk that the resulting packet size exceeds the capability of the error correction code.

This change removes magic constants to make full use of the of the error correcting bandwidth defined by the protocol and supported by Moonlight.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Dependency update (updates to dependencies)
  • Documentation update (changes to documentation)
  • Repository update (changes to repository files, e.g. .github/...)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

  • I want maintainers to keep my branch updated

Sunshine implementation arbitrarily limits itself to 3 FEC blocks even though the protocol supports 4. This is fine for most use cases. However, when dealing with large payloads (low FPS, high bitrate), there is a risk that the resulting packet size exceeds the capability of the error correction code.

This change removes magic constants to makes full use of the of the protocol.
@CLAassistant
Copy link

CLAassistant commented Jul 1, 2024

CLA assistant check
All committers have signed the CLA.

@cgutman
Copy link
Collaborator

cgutman commented Jul 1, 2024

We should something like #1466 instead of this. The real problem is that our FEC groups are hardcoded rather than dynamic as they would be in that PR.

The issue is that proper FEC calculation is uncovering our longstanding issues with exhaustion of packet buffers in switches and NICs, so we need some packet pacing changes to support this properly.

@radugrecu97

This comment was marked as off-topic.

@ns6089
Copy link
Contributor

ns6089 commented Jul 2, 2024

@cgutman If feedback-based packet pacing is not ready, maybe in the meantime we can hardcode 1GbE pacing? This should resolve most of the local streaming problems without adding significant amount of latency,

@cgutman
Copy link
Collaborator

cgutman commented Jul 3, 2024

Yeah, static pacing would probably be enough to unblock #1466

@ns6089
Copy link
Contributor

ns6089 commented Jul 3, 2024

Yeah, static pacing would probably be enough

We have

bool send_batch(batched_send_info_t &send_info);
bool send(send_info_t &send_info);

in platform/common.h, if we add

bool send_batch_paced(batched_send_info_t &send_info,
                      size_t pacing_block_size,
                      std::nanoseconds pacing_block_duration)

In your opinion, could it be reused later on for dynamic pacing, or you would rather prefer a different interface?

@ns6089
Copy link
Contributor

ns6089 commented Jul 3, 2024

Or even something as simple as

bool send_batch_paced(batched_send_info_t &send_info, double bytes_per_second);

Because I don't think we want to do busy-waiting and will have to rely on sleeps with limited precision. So the overall pacing will look like:

  1. Dump 1ms worth of data
  2. Sleep for 1ms
  3. Check how long we actually slept and adjust next batch size
  4. Dump, sleep and repeat

@ns6089
Copy link
Contributor

ns6089 commented Jul 3, 2024

After looking more deeply into this, we may have to do resort to busy spin waiting after all.
RX/TX buffers seem to be more limited than I previously thought, we probably shouldn't batch more than 100 packets at once, maybe even less.
So yeah, this brings us back to the original function

bool send_batch_paced(batched_send_info_t &send_info,
                      size_t pacing_block_size_in_packets,
                      std::nanoseconds pacing_block_duration)

@cgutman If you're ok with this implementation, I can work on it.

@cgutman
Copy link
Collaborator

cgutman commented Jul 4, 2024

@ns6089 yep, I agree. I think our batch size will have to be small enough that using OS sleep functionality will overshoot our intended sleep time (definitely on Windows, where the best you can get is rougly ~0.5ms). The way FEC blocks are computed today basically means that the FEC computation itself acts as a little busy wait between blocks, and that's what keeps the current system working today.

At least since we are actually intending to busy wait rather than doing actual work, we can put some code in the body of our busy waiting loop to allow other SMT threads to make progress, like:

#if defined(__i386__) || defined(__x86_64__)
__builtin_ia32_pause();
#elif defined(__aarch64__)
__yield();
#endif

Recent versions of Boost have sp_thread_pause() is exactly what we need, but I think it's only present on Boost 1.83 and later, so we can't rely on it.

@ns6089
Copy link
Contributor

ns6089 commented Jul 4, 2024

At least since we are actually intending to busy wait rather than doing actual work, we can put some code in the body of our busy waiting loop to allow other SMT threads to make progress

Yeah, we can implement some spin_wait(std::chrono::nanoseconds time) function that will do these nop instructions in a loop and check time periodically, possibly dynamically adjusting the amount of instructions between time checks.

The way FEC blocks are computed today basically means that the FEC computation itself acts as a little busy wait between blocks, and that's what keeps the current system working today.

Interesting, we should probably make use of this and do the pacing in stream.cpp. I don't think FEC encoding can be done progressively, but packet headers and especially encryption totally can. @cgutman so unless you have something half-implemented and want to finish it, I will grab your FEC block size optimization PR and try to add pacing on top of it. Later on we should be able to expand it to some proper dynamic flow control, hopefully.

@ns6089 ns6089 mentioned this pull request Jul 4, 2024
11 tasks
@ReenigneArcher
Copy link
Member

Closing in favor of #2803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants