Bump MAX_FEC_BLOCKS to 4 #2787

CypherGrue · 2024-07-01T12:16:04Z

Description

Sunshine implementation arbitrarily limits itself to 3 FEC blocks even though the protocol supports 4. This is fine for typical use cases. However, when dealing with large payloads (low FPS, high bitrate), there is a risk that the resulting packet size exceeds the capability of the error correction code.

This change removes magic constants to make full use of the of the error correcting bandwidth defined by the protocol and supported by Moonlight.

Mitigates AMF Encoder on Radeon 5000 and newer overshoots encoding target at higher constant bitrates #1040 and similar issues

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Dependency update (updates to dependencies)
Documentation update (changes to documentation)
Repository update (changes to repository files, e.g. .github/...)

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

I want maintainers to keep my branch updated

Sunshine implementation arbitrarily limits itself to 3 FEC blocks even though the protocol supports 4. This is fine for most use cases. However, when dealing with large payloads (low FPS, high bitrate), there is a risk that the resulting packet size exceeds the capability of the error correction code. This change removes magic constants to makes full use of the of the protocol.

CLAassistant · 2024-07-01T12:16:10Z

All committers have signed the CLA.

cgutman · 2024-07-01T22:42:34Z

We should something like #1466 instead of this. The real problem is that our FEC groups are hardcoded rather than dynamic as they would be in that PR.

The issue is that proper FEC calculation is uncovering our longstanding issues with exhaustion of packet buffers in switches and NICs, so we need some packet pacing changes to support this properly.

ns6089 · 2024-07-02T15:15:23Z

@cgutman If feedback-based packet pacing is not ready, maybe in the meantime we can hardcode 1GbE pacing? This should resolve most of the local streaming problems without adding significant amount of latency,

cgutman · 2024-07-03T01:34:20Z

Yeah, static pacing would probably be enough to unblock #1466

ns6089 · 2024-07-03T17:21:45Z

Yeah, static pacing would probably be enough

We have

bool send_batch(batched_send_info_t &send_info);
bool send(send_info_t &send_info);

in platform/common.h, if we add

bool send_batch_paced(batched_send_info_t &send_info,
                      size_t pacing_block_size,
                      std::nanoseconds pacing_block_duration)

In your opinion, could it be reused later on for dynamic pacing, or you would rather prefer a different interface?

ns6089 · 2024-07-03T18:53:53Z

Or even something as simple as

bool send_batch_paced(batched_send_info_t &send_info, double bytes_per_second);

Because I don't think we want to do busy-waiting and will have to rely on sleeps with limited precision. So the overall pacing will look like:

Dump 1ms worth of data
Sleep for 1ms
Check how long we actually slept and adjust next batch size
Dump, sleep and repeat

ns6089 · 2024-07-03T19:48:40Z

After looking more deeply into this, we may have to do resort to busy spin waiting after all.
RX/TX buffers seem to be more limited than I previously thought, we probably shouldn't batch more than 100 packets at once, maybe even less.
So yeah, this brings us back to the original function

bool send_batch_paced(batched_send_info_t &send_info,
                      size_t pacing_block_size_in_packets,
                      std::nanoseconds pacing_block_duration)

@cgutman If you're ok with this implementation, I can work on it.

cgutman · 2024-07-04T02:10:01Z

@ns6089 yep, I agree. I think our batch size will have to be small enough that using OS sleep functionality will overshoot our intended sleep time (definitely on Windows, where the best you can get is rougly ~0.5ms). The way FEC blocks are computed today basically means that the FEC computation itself acts as a little busy wait between blocks, and that's what keeps the current system working today.

At least since we are actually intending to busy wait rather than doing actual work, we can put some code in the body of our busy waiting loop to allow other SMT threads to make progress, like:

#if defined(__i386__) || defined(__x86_64__)
__builtin_ia32_pause();
#elif defined(__aarch64__)
__yield();
#endif

Recent versions of Boost have sp_thread_pause() is exactly what we need, but I think it's only present on Boost 1.83 and later, so we can't rely on it.

ns6089 · 2024-07-04T09:18:53Z

At least since we are actually intending to busy wait rather than doing actual work, we can put some code in the body of our busy waiting loop to allow other SMT threads to make progress

Yeah, we can implement some spin_wait(std::chrono::nanoseconds time) function that will do these nop instructions in a loop and check time periodically, possibly dynamically adjusting the amount of instructions between time checks.

The way FEC blocks are computed today basically means that the FEC computation itself acts as a little busy wait between blocks, and that's what keeps the current system working today.

Interesting, we should probably make use of this and do the pacing in stream.cpp. I don't think FEC encoding can be done progressively, but packet headers and especially encryption totally can. @cgutman so unless you have something half-implemented and want to finish it, I will grab your FEC block size optimization PR and try to add pacing on top of it. Later on we should be able to expand it to some proper dynamic flow control, hopefully.

ReenigneArcher · 2024-07-08T18:03:15Z

Closing in favor of #2803

This comment was marked as off-topic.

Sign in to view

ns6089 mentioned this pull request Jul 4, 2024

Implement basic network flow control #2803

Merged

11 tasks

ReenigneArcher closed this Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump MAX_FEC_BLOCKS to 4 #2787

Bump MAX_FEC_BLOCKS to 4 #2787

CypherGrue commented Jul 1, 2024 •

edited

Loading

CLAassistant commented Jul 1, 2024 •

edited

Loading

cgutman commented Jul 1, 2024

This comment was marked as off-topic.

ns6089 commented Jul 2, 2024

cgutman commented Jul 3, 2024

ns6089 commented Jul 3, 2024

ns6089 commented Jul 3, 2024

ns6089 commented Jul 3, 2024

cgutman commented Jul 4, 2024 •

edited

Loading

ns6089 commented Jul 4, 2024

ReenigneArcher commented Jul 8, 2024

Bump MAX_FEC_BLOCKS to 4 #2787

Bump MAX_FEC_BLOCKS to 4 #2787

Conversation

CypherGrue commented Jul 1, 2024 • edited Loading

Description

Type of Change

Checklist

Branch Updates

CLAassistant commented Jul 1, 2024 • edited Loading

cgutman commented Jul 1, 2024

This comment was marked as off-topic.

ns6089 commented Jul 2, 2024

cgutman commented Jul 3, 2024

ns6089 commented Jul 3, 2024

ns6089 commented Jul 3, 2024

ns6089 commented Jul 3, 2024

cgutman commented Jul 4, 2024 • edited Loading

ns6089 commented Jul 4, 2024

ReenigneArcher commented Jul 8, 2024

CypherGrue commented Jul 1, 2024 •

edited

Loading

CLAassistant commented Jul 1, 2024 •

edited

Loading

cgutman commented Jul 4, 2024 •

edited

Loading