-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement basic network flow control #2803
Conversation
Basic implementation wasn't that hard, but it uncovered a few problems.
@cgutman Have you considered alternative RS implementations that make use of SIMD instructions? |
I don't think this needs to block merging this PR, unless you think these changes are going to make it worse somehow.
Sunshine/cmake/targets/common.cmake Lines 88 to 90 in c92e4f2
If SSSE3 resolves the performance issues you saw at the bitrates we're targeting, that's probably enough. If we want to go further, we can use GCC/Clang function multiversioning to have the compiler build AVX2, SSSE3, and SSE2 variants, but that requires more extensive modifications. We could also switch to totally new library if necessary. |
Yeah, enabling SSSE3 for nanors gives RS encode around 6 times speedup, bringing it to sub millisecond range. Don't need to bother with separate thread then, one less thing to worry about. We should absolutely enabled it. |
Excellent, that's a pretty substantial win. In terms of hardware support we're dropping with SSSE3, it looks like Intel CPUs prior to Core 2/Atom (~2006), AMD CPUs prior to Bulldozer (~2011), and VIA CPUs prior to Nano (~2008). It looks like all common x64 emulation layers like Rosetta 2 and XTA/Prism support SSSE3, so there should be no issues there either. The only non-SSSE3 CPUs there that might otherwise be performant enough for some modern games/applications would be AMD K10-based processors (Phenom II). However, those also lack AES-NI, so they're going to get punished by encryption too with ~20x more cycles/byte vs AES-NI. Overall, I think bumping up the CPU requirement to SSSE3 is probably reasonable. Users with very old CPUs can use an older version of Sunshine. |
@cgutman This generic low latency flow control doesn't seem to be working well, at least on Windows. |
Or try using QoS shaping https://learn.microsoft.com/en-us/windows/win32/api/qos2/ns-qos2-qos_flowrate_outgoing |
Oh, turns out my travel wifi router has atrocious ethernet switch and more or less requires 0.1ms batches. I guess this is the kind of devices we have problems with. Will try to optimize for it I guess, even though |
Two things were necessary when dealing with this switch
No sleep or pacing was needed, socket buffer took care of the congestion. I guess I can try adding some throttling logic on top of this larger buffer. In theory it should be enough to appease slow clients without resorting to costly spin waits with periodic latency spikes. |
04dc814
to
ea2b9a6
Compare
Should be ready. Works really well on Windows from my tests, haven't tested Linux or MacOS. @cgutman Maybe we can distinguish WAN streaming from LAN streaming based on the packet size? And apply more aggressive throttling in this case, based on requested bitrate. Something like 2 or 3 times the bitrate, will give 1/2 or 1/3 frame time latency. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2803 +/- ##
=======================================
Coverage 8.99% 8.99%
=======================================
Files 95 95
Lines 17312 17412 +100
Branches 8236 8272 +36
=======================================
+ Hits 1557 1567 +10
- Misses 12890 13118 +228
+ Partials 2865 2727 -138
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Fixed the typo in linux/macos code, builds shouldn't fail now. |
I also have the follow-up refactoring to periodic loggers that remove most of the syntactic bloat from them, improving the readability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pacing behavior seems great in my testing on Windows and Linux. I'm seeing the expected behavior of densely packed FEC blocks without the nasty packet loss I had in my previous attempt.
Once these last few issues are resolved, this looks good to merge.
We were too conservative in determining our max data size before needing to split, which resulted in many frames being split into multiple FEC blocks unnecessarily. We also just used a hardcoded split into 3 blocks instead of actually calculating how many blocks are actually required.
Co-authored-by: Cameron Gutman <[email protected]>
Description
Combat RX/TX buffer overflows and improve multi-FEC on large frames.
Adopted from #1466
Should supersede #2787
Screenshot
Issues Fixed or Closed
Type of Change
.github/...
)Checklist
Branch Updates
LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.