Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel QuickSync support #77

Closed
wants to merge 13 commits into from
Closed

Intel QuickSync support #77

wants to merge 13 commits into from

Conversation

caioavidal
Copy link

@caioavidal caioavidal commented Feb 27, 2022

Description

  • Updated prebuilt ffmpeg library
  • Add qsv encoder support

I have been trying to implement intel quicksync encoder to this project. The encoder itself seems to be working fine as I tested outputting the screen captured to a file. The problem is that moonlight seems to not recognize IDR frames. It waits forever for it and then disconnects.
I'm attaching here the output from ffprobe.

Output from Moolight QT:
00:00:19 - SDL Info (0): FFmpeg-based video decoder chosen
00:00:19 - SDL Info (0): Dropping window event during flush: 6 (1920 1080)
00:00:20 - SDL Info (0): Received first video packet after 2100 ms
00:00:20 - SDL Info (0): Waiting for IDR frame
00:00:20 - SDL Info (0): Waiting for IDR frame
00:00:20 - SDL Info (0): Waiting for IDR frame

I have tested on moonlight android, ios and windows app and all of them once connected are giving black screen.

Please let me know what you guys think.

Type of Change

frame_qsv.txt

  • New feature (non-breaking change which adds functionality)

Copy link
Member

@ReenigneArcher ReenigneArcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the other comments... does anything need to change in the example conf file? Anything in the webui in order to enable this encoder when not using automatic mode?

CMakeLists.txt Outdated Show resolved Hide resolved
@@ -1,1050 +1,1050 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file is not needed, delete it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ReenigneArcher why was this file deleted? As of https://github.com/SunshineStream/Sunshine/blob/master/third-party/cbs/cbs.c the file is present inside the repository and it will be removed by this PR

Copy link
Member

@ReenigneArcher ReenigneArcher Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ReenigneArcher why was this file deleted? As of https://github.com/SunshineStream/Sunshine/blob/master/third-party/cbs/cbs.c the file is present inside the repository and it will be removed by this PR

I am not sure... when I left the comment the entire file was commented out. See here: https://github.com/SunshineStream/Sunshine/blob/b43ae668fb5f9017759df005d80ecf5d3b2ef5f5/third-party/cbs/cbs.c

Copy link
Author

@caioavidal caioavidal Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheElixZammuto Because new ffmpeg library already has cbc.c implementation. I had to update ffmpeg and also added libmfx for qsv support

@@ -575,6 +602,7 @@ static std::vector<encoder_t> encoders {
#endif
#ifdef _WIN32
amdvce,
quicksync,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this encoder work on Linux and/or Mac as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, only tested on Windows. I can test it on linux and mac once I get this working well

@@ -1810,6 +1859,8 @@ platf::mem_type_e map_dev_type(AVHWDeviceType type) {
return platf::mem_type_e::dxgi;
case AV_HWDEVICE_TYPE_VAAPI:
return platf::mem_type_e::vaapi;
case AV_HWDEVICE_TYPE_QSV:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Intel QSV allow hardware input? The system mem type does work by copying texture to system ram and then encoding,which can hurt performance.
The ideal option is to encode directly from the GPU (like AMF and NVEnc - platf::mem_type_e::dxgi)

Copy link
Author

@caioavidal caioavidal Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have tried that approach but send_frame function simply throws. The only way I could have this working was copying to system ram. I will investigate it later. I'm now focusing on making it work

Copy link
Collaborator

@cgutman cgutman Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like hwcontext_qsv.c can do it, but you might need to derive your QSV hwcontext from a D3D11VA hwcontext in order for it to work.

After that, should be as simple as taking the D3D11VA frame we fake up here and doing:

AVFrame* qsv_frame = av_frame_alloc();
qsv_frame->format = AV_PIX_FMT_QSV;
av_hwframe_map(qsv_frame, dxgi_frame, AV_HWFRAME_MAP_READ);

I agree that this should probably be done in a later PR though.

Copy link
Author

@caioavidal caioavidal Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to derive from D3D11VA but unfortunately getting
[AVHWDeviceContext @ 000001a860c0b3c0] Error setting child device handle: -16

Not sure if I am doing this the right way:
av_hwdevice_ctx_create_derived(&hw_device_ctx, AV_HWDEVICE_TYPE_QSV, d3d11va.get(), 0);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgutman just posted a question on stackoverflow. If you have any clue on that pls let me know. thanks
https://stackoverflow.com/questions/71356395/derive-qsv-hwdevice-from-d3d11va-hwdevice-using-ffmpeg

@@ -1,1050 +1,1050 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ReenigneArcher why was this file deleted? As of https://github.com/SunshineStream/Sunshine/blob/master/third-party/cbs/cbs.c the file is present inside the repository and it will be removed by this PR

@TheElixZammuto
Copy link
Contributor

TheElixZammuto commented Feb 28, 2022

My tests unfortunately were unfruitful: https://gist.github.com/TheElixZammuto/da1f117f1b01003356cffa5f2833bb79

Windows 11 build 22000, Intel Core i5 5005U, Driver Version 20.19.15.5063

AV_PIX_FMT_NV12, AV_PIX_FMT_P010,
{
{
{ "forced-idr"s, 1 },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but have tried forced_idr as well. Actually I could manage it to send only idr frames but then getting:
00:10:10 - SDL Info (0): Unrecoverable frame 740637842: lost FEC blocks 1 to 3
00:10:10 - SDL Info (0): Unrecoverable frame -1801679895: lost FEC blocks 1 to 3
00:10:10 - SDL Info (0): Unrecoverable frame 1140981760: 0+2=2 received < 659 needed

I have tried many options to solve this "Waiting for IDR frame" error but none of them seemed to work

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you capture the raw encoder output to a file, I can take a look at the bitstream and figure out what's going on.

Does it affect both H.264 and HEVC?

Copy link
Author

@caioavidal caioavidal Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@caioavidal caioavidal Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it would help you but I was comparing software encode bitstream against qsv and it looks like nal_ref_idc is different. For instance SPS, PPS and IDR are 1 on qsv while it is 3 on software bitstream.

Copy link
Author

@caioavidal caioavidal Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I finally got video and frames are running fine on moonlight android
I basically needed to do this:

    payload_new = replace(payload, "\000\000\000\001\'"sv, "\000\000\000\001g"sv);
    payload     = { (char *)payload_new.data(), payload_new.size() };

    payload_new = replace(payload, "\000\000\000\001("sv, "\000\000\000\001h"sv);
    payload     = { (char *)payload_new.data(), payload_new.size() };

    payload_new = replace(payload, "\000\000\001%"sv, "\000\000\001e"sv);
    payload     = { (char *)payload_new.data(), payload_new.size() };

It replaces the nal_ref_idr of key frames to 3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting. Old versions of Moonlight did care about nal_ref_idc (in violation of the H.264 and HEVC specification), but recent versions (within the last year or so) shouldn't care. The fix was moonlight-stream/moonlight-common-c@b528867

I wonder if what's going on is that Sunshine itself is breaking the bitstream (or rather the fixups it's trying to perform aren't working because the original payload doesn't have the expected bytes).

Copy link
Collaborator

@cgutman cgutman Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at output.h264 and it looks like the issue is that the NALU for the IDR image has the 3-byte Annex B start code rather than the 4-byte Annex B start code that Moonlight expects for the start of a frame (and which is used for the SPS, PPS, and SEI NALUs).

image

I think using a 3-byte start code is legal, but just changing moonlight-common-c to accept would expose bugs in some clients (iOS at least) that assume the NALU start prefixes are always 4 bytes long. Handling both 3 and 4 byte start prefixes would complicate the logic we use on iOS to convert from Annex B to AVCC format.

I think it makes sense to continue to fix this up on the Sunshine side, but we can make the logic more robust to avoid getting confused if the first NALU byte varies between decoders.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, makes total sense to me. thanks for you support.

@caioavidal
Copy link
Author

My tests unfortunately were unfruitful: https://gist.github.com/TheElixZammuto/da1f117f1b01003356cffa5f2833bb79

Windows 11 build 22000, Intel Core i5 5005U, Driver Version 20.19.15.5063

Hey @TheElixZammuto. Could you try again? We have new commits now.

@TheElixZammuto
Copy link
Contributor

So, I did some tests on the same machine as above, connected on Ethernet to the client.
While now Intel Codec gets detected, performance is definetly not on par with the AMD encoder, and more similar to the software encoder (in terms of latency)

2022-03-20 18_23_21-Gestione attività
This is he resource consumption of Sunshine on the Intel machine
2022-03-20 18_24_35-Moonlight
And this is the consumption of Sunshine on the AMD machine.

My gut is telling me that some calculations are still done on the CPU rather than on GPU (due to the much higher memory usage), but I'm not sure of what it could be (but I would guess it's something related to this?

@caioavidal
Copy link
Author

caioavidal commented Mar 21, 2022

So, I did some tests on the same machine as above, connected on Ethernet to the client. While now Intel Codec gets detected, performance is definetly not on par with the AMD encoder, and more similar to the software encoder (in terms of latency)

2022-03-20 18_23_21-Gestione attività This is he resource consumption of Sunshine on the Intel machine 2022-03-20 18_24_35-Moonlight And this is the consumption of Sunshine on the AMD machine.

My gut is telling me that some calculations are still done on the CPU rather than on GPU (due to the much higher memory usage), but I'm not sure of what it could be (but I would guess it's something related to this?

Thanks for testing again. Yes, this is due to scale and hwframe_transfer_data. I'm trying to implement it using d3d11 device but getting device failed while encoding

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity. Remove the stale label or comment, otherwise this will be closed in 5 days.

@github-actions github-actions bot added the stale label Apr 21, 2022
@github-actions
Copy link

This PR was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this Apr 27, 2022
@TheElixZammuto
Copy link
Contributor

TheElixZammuto commented Apr 29, 2022

@caioavidal Which Version were you using during your tests? It seems that D3D11 support for QSV was added in a later date then the FFMPEG Prebuilts: FFmpeg/FFmpeg@a08a529

EDIT: Probably we need FFMPEG 5.0 for this

@caioavidal
Copy link
Author

caioavidal commented Apr 29, 2022

@caioavidal Which Version were you using during your tests? It seems that D3D11 support for QSV was added in a later date then the FFMPEG Prebuilts: FFmpeg/FFmpeg@a08a529

EDIT: Probably we need FFMPEG 5.0 for this

not really sure. I got this version N-105642-g538be75a69. FFmpeg/FFmpeg@538be75a69
It seems it already support d3d11 for QSV. I'm still getting "device failed" while trying to encode the d3d11 texture.
Right now I'm doing the following:

  • create d3d11va device
  • derive qsv device from d3d11va
  • derive qsv hwframe from d3d11va
  • add texture to d3d11 hwframe
  • map hwframe from d3d11 to qsv
  • encode qsv frame (getting "device failed" here, until here everything runs well)
    I wonder what I am doing wrong.

@ReenigneArcher
Copy link
Member

EDIT: Probably we need FFMPEG 5.0 for this

In that case, we probably need to get ffmpeg 5.0 done in a separate PR. I'm told that using cpack will include the dependencies for the user, so that will simplify build/versus run time dependencies.

Maybe we address ffmpeg 5.0 after this? #139

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity. Remove the stale label or comment, otherwise this will be closed in 5 days.

@github-actions github-actions bot added the stale label May 30, 2022
@github-actions
Copy link

github-actions bot commented Jun 5, 2022

This PR was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this Jun 5, 2022
@Saijin-Naib
Copy link

Is there still movement on this front (pending major FFMPEG upgrade)? If so, I'd like to help test as I can since I am using Intel/QuickSync on Linux, and plan on getting the Intel Arc dGPUs as soon as they launch.

@ReenigneArcher
Copy link
Member

Is there still movement on this front (pending major FFMPEG upgrade)? If so, I'd like to help test as I can since I am using Intel/QuickSync on Linux, and plan on getting the Intel Arc dGPUs as soon as they launch.

ffmpeg still needs to be upgraded (on all platforms/builds)...

@Saijin-Naib
Copy link

Okay, thanks for confirming that the immediate blocker is the pending major FFMPEG upgrade work. I have heard from other projects that can be a bit of a challenge. I'll watch this space.

@ytoaa
Copy link

ytoaa commented Nov 25, 2022

Intel qsv encoding is not yet supported in Windows.

@ReenigneArcher
Copy link
Member

Intel qsv encoding is not yet supported in Windows.

This was PR was never completed or merged.

@ytoaa
Copy link

ytoaa commented Nov 25, 2022

Intel qsv encoding is not yet supported in Windows.

This was PR was never completed or merged.

Do you think it's going to be hard to see this feature this year?

@ReenigneArcher
Copy link
Member

We're working to update to ffmpeg5. From my understanding quick sync won't be very useful for sunshine because it's slower.

@ytoaa
Copy link

ytoaa commented Nov 25, 2022

We're working to update to ffmpeg5. From my understanding quick sync won't be very useful for sunshine because it's slower.

However, Intel graphics encoding using vaapi on Linux was quite satisfying to me.

@caioavidal
Copy link
Author

We're working to update to ffmpeg5. From my understanding quick sync won't be very useful for sunshine because it's slower.

It might be faster than sw encoding at least

@aandaluz
Copy link

aandaluz commented Dec 1, 2022

Devs are now upgrading ffmpeg to 5.x series in #509

@brad-richardson brad-richardson mentioned this pull request Dec 23, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants