Add standalone NVENC encoder #1427

ns6089 · 2023-07-07T10:05:00Z

Description

Add standalone NVENC encoder for reference frames invalidation right now. And possibly for VFR-like bitrate adjustment and more somewhere down the line. Windows version is fully functional . Linux cuda (and possibly opengl) support is out of scope for this PR, but can easily be done later (at least the encoder side),

In later PRs

Investigate and test the viability of "increased vbv" option. Ideally, P-frames should not steal bitrate budget from future frames, maybe this can be achieved with vbv offset in encoder.
Investigate the viability of using multiple ref frames (L0 > 1) since we apply strict limits on the vbv. Maybe nvenc can intelligently pick blocks from previous frames if they have higher qp? Can't imagine this being free though.
Run VMAF benchmark for const-qp mode (document the process in .md file), and pick default values for min-qp
Check GFE default ref frame buffer values, particularly for h264 (level4 vs level5)
New configuration page and documentation

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Dependency update (updates to dependencies)
Documentation update (changes to documentation)
Repository update (changes to repository files, e.g. .github/...)

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

I want maintainers to keep my branch updated

Things to implement before merge (may be expanded)

Refactor colospace selection logic, it has too much duplication right now
Decide how to handle encoder ref frames buffer size, currently it's at 16 frames (dynamically based on resolution and framerate? through configuration?)
Result: use 5 ref frames buffer, which corresponds to DPB=5 in h264 terms, and DPB=6 in HEVC. Both are required minimums that must be supported by decoder profiles for given resolution.
Update: h265 level 4 only supports 4 ref frames, but in this case the client can set the limit, this level is very outdated nowadays
Investigate if NVENC can be told explicitly which ref frame to use, this can allow wider decoder support for ref frames invalidation.
Verdict: when L0 (forward prediction) list size is 1, it should always use last frame as reference,
Try to patch num_ref_frames in SPS header
Verdict: may be possible for h264, close to impossible for HEVC, either way too much hassle
Test encoder caps
NV_ENC_CAPS_SUPPORT_CABAC
NV_ENC_CAPS_WIDTH_MAX
NV_ENC_CAPS_HEIGHT_MAX
NV_ENC_CAPS_SUPPORT_CUSTOM_VBV_BUF_SIZE
NV_ENC_CAPS_SUPPORT_REF_PIC_INVALIDATION
NV_ENC_CAPS_SUPPORT_YUV444_ENCODE
NV_ENC_CAPS_SUPPORT_10BIT_ENCODE
NV_ENC_CAPS_SUPPORT_MULTIPLE_REF_FRAMES
NV_ENC_CAPS_SUPPORT_QPELMV
Switch to nv-codec-headers
Update Linux and MacOS platforms to new structures
~~Use Peak-Signal-to-Noise-Ratio (Y-PSNR), Structural Similarity Index (Y-SSIM), and Video Multimethod Assessment Fusion (VMAF) for default min qp values~~ Not in this PR
~~Add configuration page and documentation~~ Not in this PR

From review

Look into entropyCodingMode encoding parameter
Proper cleanup in create_encoder()
Look into last_encoder_probe_supported_invalidate_ref_frames, if needs to be changed in multiple places
Lock h264 into High profile

src/nvenc/nvenc_base.cpp

src/platform/windows/display_vram.cpp

src/nvenc/nvenc_base.cpp

src/video.cpp

ns6089 · 2023-07-11T12:05:40Z

@cgutman, need your opinion on colorspace refactoring a2f34ab. When you have time, it's not blocking me.

ns6089 · 2023-07-11T16:01:17Z

@cgutman, need your opinion on colorspace refactoring a2f34ab. When you have time, it's not blocking me.

Or better wait until I finish the Linux part, it approaches things slightly differently and might need adjustments.

ns6089 · 2023-07-18T12:26:03Z

New options page (not yet pushed to this PR). Some things might still be slightly reworded, and need to run a few tests for better default QP values. But conceptually it should be done, and I think I will be able to backport it to ffmpeg nvenc backend so we don't end up with two pages.

src/platform/linux/vaapi.cpp

cgutman · 2023-08-04T02:03:42Z

LGTM. Let's try to get this refactoring in soon if possible. I've got some AV1 and client-side cursor changes that I'd like to work on, but I don't want to stomp all over your work.

If you can back out any debugging changes or WIP test changes (the scheduling priority revert?), I can test this across all my systems and we can get it merged.

ns6089 · 2023-08-04T12:27:48Z

LGTM. Let's try to get this refactoring in soon if possible. I've got some AV1 and client-side cursor changes that I'd like to work on, but I don't want to stomp all over your work.

If you can back out any debugging changes or WIP test changes (the scheduling priority revert?), I can test this across all my systems and we can get it merged.

Understandable. Wired the encoder to use the existing configuration page, since some new options pretty much depend on whether I can make it work with realtime priority (and priority selection itself will add another option).

Haven't tested native HDR (only SDR in BT.2020) and CUDA path on Linux (didn't make drastic changes, but there's always a chance). Other than that, should be alright to merge in its current state. The rest of the features shouldn't produce conflicts, and can be done in later PRs.

third-party/nv-codec-headers

src/nvenc/nvenc_base.cpp

src/nvenc/nvenc_utils.cpp

src/video_colorspace.cpp

cgutman · 2023-08-13T10:40:52Z

The latest changes look good. Once you squash the fixup commits and rebase, I'll do a final testing pass on my local machines and we can get this in.

Shouldn't matter on x64 since everything is fastcall here, but cdecl is the correct declaration.

ns6089 · 2023-08-13T10:48:40Z

Squashed and rebased. Also added small "fix" for nvapi, it doesn't affect anything since we're strictly x64, so no point in opening full pull request for it.

cgutman · 2023-08-13T12:00:46Z

Everything looks good in my tests:

HDR with and without NVENC
NvFBC on Linux
All valid colorspace and color range combos