Chinese users may first update mirrors:
First create a Conda environment:
conda env create -f environment.yml # create the pqe env
conda activate pqe
Then install PyTorch v1 + MMCV v1 + MMEditing v0. My code:
# please refer to https://pytorch.org
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html
# or
#pip install openmim
#mim install mmcv-full
# install the submodule mmediting in the editable mode
pip install -e mmediting
# verify
#cd ~
#python -c "import mmedit; print(mmedit.__version__)"
Source: homepage.
File tree before compression:
div2k
`-- train
` `-- 0{001,002,...,800}.png
`-- valid
`-- 0{801,802,...,900}.png
Source: GitHub.
File tree before compression:
flickr2k
`-- 00{0001,0002,...,2650}.png
Flickr2K does not have a division into training and testing sets. During the compression, we create two annotation files for training and testing. The images with filenames 00{0001,...,1988}.png
are used for training, while 00{1989,...,2650}.png
are reserved for testing. For more details, please refer to the file powerqe/tools/data/compress_img.py
.
Source: homepage.
File tree before compression:
vimeo_triplet
`-- tri_trainlist.txt
`-- tri_testlist.txt
`-- sequences
`-- 00001
` `-- 0001
` ` `-- im{1,2,3}.png
` `-- 0002
` `-- ...
` `-- 1000
`-- 00002
`-- ...
`-- 00078
To shorten the validation time during training, we select 42 sequences (matched by 00001/*
) in tri_testlist.txt
(3782 sequences in total) for validation and construct tri_validlist.txt
for validation.
File tree before compression:
vimeo_septuplet
`-- sep_trainlist.txt
`-- sep_testlist.txt
`-- sequences
`-- 00001
` `-- 0001
` ` `-- im{1,2,...,7}.png
` `-- 0002
` `-- ...
` `-- 1000
`-- 00002
`-- ...
`-- 00096
To shorten the validation time during training, we select 58 sequences (matched by 00001/*
) in sep_testlist.txt
(7824 sequences in total) for validation and construct sep_validlist.txt
for validation.
Source: GitHub.
File tree before compression:
mfqev2_planar
`-- train
` `-- 11skateboarding-10_1920x1080_87.yuv
` `-- ...
`-- test
`-- BQMall_832x480_600.yuv
`-- ...
Unlike the Vimeo-90K dataset, which consists of images, the MFQEv2 dataset consists of planar files. Consequently, the ground truth (GT) images will be generated from these planar files using the YCbCr 420P to RGB24 conversion during compression.
Moreover, each sequence in the MFQEv2 dataset contains a varying number of images. We generate a maximum of 300 images from each planar file.
The minimum patch size for all video datasets should not be smaller than 256, as it is required by BasicVSR++. We filter out videos from the training set if their height or width is smaller than 256.
Also, we filter out Traffic_2560x1600_150
and PeopleOnStreet_2560x1600_150
for testing since they are too big for most types of GPU.
Better Portable Graphics (BPG) is an image format based on the intra-frame encoding of the High Efficiency Video Coding (HEVC) standard.
Please refer to the official site and the GitHub mirror for instructions. The following is my personal experience of using libbpg v0.9.8.
Clone:
cd data
git clone --depth=1 https://github.com/mirrorer/libbpg.git libbpg
Modify libbpg/Makefile
:
- Comment
USE_X265=y
and uncommentUSE_JCTVC=y
: We want to use JCTVC instead of x265. - Comment
USE_BPGVIEW=y
: We do not need BPGView.
Build:
cd data/libbpg
make clean
make
There may be errors during the build. We need to install the required dependencies according to the error messages. For examples,
sudo apt-get install libpng-dev
sudo apt-get install libjpeg-dev
# build again
make clean
make
Basic usage:
bpgenc [-q quality] -o bpg_path src_path # src image -> bpg
bpgdec -o tar_path bpg_path # bpg -> tar image
Check bpgenc -h
and bpgdec -h
for detailed usage.
Run:
conda activate pqe &&\
python tools/data/compress_img.py --codec bpg --dataset div2k
Resulting file tree:
powerqe
`-- data
` `-- libbpg
` ` `-- bpgdec
` ` `-- bpgenc
` `-- {div2k,div2k_lq/bpg/qp37}
` `-- train
` ` `-- 0{001,002,...,800}.png
` `-- valid
` `-- 0{801,802,...,900}.png
`-- tmp/div2k_lq/bpg/qp37 # can be deleted after compression
`-- train
` `-- 0{001,002,...,800}.bpg
`-- valid
`-- 0{801,802,...,900}.bpg
HM is the reference software for High Efficiency Video Coding (HEVC).
Please refer to the official site and the document (e.g., HM/doc/software-manual.pdf
) for instructions. The following is my personal experience of using HM 18.0.
cd data
git clone -b HM-18.0 --depth=1 https://vcgit.hhi.fraunhofer.de/jvet/HM.git hm18.0
cd hm18.0
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j
conda activate pqe &&\
python tools/data/compress_video.py\
--dataset vimeo-triplet
Resulting file tree:
powerqe
`-- data
` `-- hm18.0
` ` `-- bin/TAppEncoderStatic
` ` `-- cfg/encoder_lowdelay_P_main.cfg
` `-- {vimeo_triplet/sequences,vimeo_septuplet_lq/hm18.0/ldp/qp37}
` `-- 00001
` ` `-- 0001
` ` ` `-- im{1,2,3}.png
` ` `-- 0002
` ` `-- ...
` ` `-- 1000
` `-- 00002
` `-- ...
` `-- 00078
`-- tmp # can be deleted after compression
`-- vimeo_septuplet_planar
` `-- 00001
` ` `-- {0001,0002,...,1000}.yuv
` `-- 00002
` `-- ...
` `-- 00078
`-- vimeo_septuplet_bit/hm18.0/ldp/qp37
` `-- 00001
` ` `-- {0001,0002,...,1000}.bin
` `-- 00002
` `-- ...
` `-- 00078
`-- vimeo_septuplet_comp_planar/hm18.0/ldp/qp37
`-- 00001
` `-- {0001,0002,...,1000}.log
` `-- {0001,0002,...,1000}.yuv
`-- 00002
`-- ...
`-- 00078
Configuration inheritance is widely used in PowerQE to ensure consistency among configurations.
To obtain clean configuration without inheritance:
conda activate pqe
python -c "from mmcv import Config; cfg = Config.fromfile('configs/arcnn/div2k_lmdb.py'); print(cfg.pretty_text)"
Due to the padding of upsampling, the error at borders is significant. PowerQE follows the common practice in MMEditing to crop image borders before evaluation.
When using test time unfolding, patch-based evaluation is conducted to save memory. The accuracy may also drop.
Some approaches such as MFQEv21 take advantage of the frame-wise quality fluctuation caused by the QP offset. We consider key frames as those with the lowest QP locally. Note that we do not consider PSNR since it also correlates to the content, while QP only correlates to the compression. This strategy is effective for low-delay P configuration, which turns off the rate control and fixes QP.
You can find QP
, IntraQPOffset
, QPoffset
, QPOffsetModelOff
, and QPOffsetModelScale
in the configuration, i.e., data/hm18.0/cfg/encoder_lowdelay_P_main.cfg
. In addition, you can find the frame QPs in log files. How do QP
, IntraQPOffset
, QPoffset
, QPOffsetModelOff
, and QPOffsetModelScale
determine the final frame QPs?
- POC 0 (I-SLICE):
QP + IntraQPOffset
- POC 1, 2, ... (P-SLICE):
QP + QPoffset + QPOffsetModelOff + QPOffsetModelScale * QP
For example, the QP
, IntraQPOffset
, QPoffset
, QPOffsetModelOff
, and QPOffsetModelScale
are 37
, -1
, [5,4,5,4,5,4,5,1]
, [-6.5,...,-6.5,0]
, and [0.2590,...,0.2590,0]
, respectively. Then, QP values for POC 0 to 8 are 36
, 45
, 44
, 45
, 44
, 45
, 44
, 45
, and 38
, respectively. Since a lower QP correlates to higher compression quality, POC 0, 2, 4, 6, and 8 are key frames.
The configuration can vary among different HM versions. In HM 16.5, QP offsets are set to
[5,4,5,1]
(GOPSize: 4).
You can also find a configuration with GOPSize being 4 for HM 18.0 at
data/hm18.0/cfg/misc/encoder_lowdelay_P_main_GOP4.cfg
.
LMDB can be effectively utilized to accelerate IO operations, particularly for storing training patches.
Pros:
- Improved training speed: By working with small patches instead of larger images, the training process can be significantly expedited.
- Reduced CPU and GPU resource usage: Processing smaller patches instead of entire images alleviates the burden on both the CPU and GPU, resulting in lower CPU utilization and reduced GPU memory consumption.
- Universal image format: LMDB allows storing all images, such as PNG, JPG, and others, as PNG format within the LMDB files.
- Consolidated patch storage: All training patches are conveniently packed into a single LMDB file, facilitating organization and access.
Cons:
- Increased memory requirements: Prior to training, the LMDB files need to be loaded into memory, which can result in higher memory usage compared to directly working with images.
- Additional time, computation, and storage for generating LMDB files.
- Fixed cropping method: Once the LMDB file is generated, the cropping method employed for extracting patches from images becomes fixed and cannot be easily modified without regenerating the LMDB file.
Run for the DIV2K dataset:
conda activate pqe &&\
python tools/data/prepare_dataset.py\
--dataset div2k
Resulting file tree:
powerqe
`-- data/lmdb
`-- div2k/train.lmdb
`-- div2k_lq/bpg/qp37/train.lmdb
`-- tmp/patches # can be deleted
`-- div2k/train
`-- div2k_lq/bpg/qp37/train
For the configuration file with LMDB loading, see configs/arcnn/div2k_lmdb.py
.
x265 is a HEVC video encoder application library. Encoding videos using x265 can be much faster than using HM. x265 has been supported by FFmpeg with libx265. As indicated by this paper2, the following script can generate compressed videos that closely resemble the output of HM:
ffmpeg -video_size <WIDTH>x<HEIGHT>\
-i <INPUT>\
-vcodec libx265\
-qp <QP>\
-x265-params <OPTIONS>\
<OUTPUT>
for options:
<OPTIONS>=
keyint=7:min-keyint=7:no-scenecut:me=full:subme=7:bframes=0:qp=<QP>
For research purpose, we need to control the distortion. Therefore, it is better to use a specific configuration of a stable version of the reference software. For example, we may use the low-delay P configuration of HM 18.0 to compress videos. In this configuration, many hyperparameters, including the frame-level QP offset, are set. The above script does not indicate these parameters.
The QP offset is proposed to achieve better RD performance3. Some approaches such as MFQEv21 take advantage of the frame-wise quality fluctuation caused by the QP offset.
First install libbpg via homebrew:
brew install libbpg
#brew list
Then replace the enc_path
and dec_path
in the script tools/data/compress_img.py
:
PowerQE follows MMCV and MMEditing to support the pre-commit hook. Installation:
conda activate pqe
pip install -U pre-commit
pre-commit install
On every commit, linters and formatter will be enforced. You can also run hooks manually:
pre-commit run --all
For example, the anchor (ID) of this paragraph is markdown-paragraph-id
. One can jump from another Markdown file to this paragraph by docs/develop.md#markdown-paragraph-id
.
Note that Markdown will convert the heading text to lowercase, remove any non-alphanumeric characters, and replace spaces with hyphens. For example, if we have two paragraphs named Markdown paragraph: ID
and Markdown paragraph ID
, their corresponding IDs will be #markdown-paragraph-id
and #markdown-paragraph-id-1
, respectively.
It's worth noting that the exact algorithm for generating the unique identifier can vary between Markdown renderers and may depend on the specific implementation details of the renderer. We may check ID by GitHub or VS Code TOC.
- Do not refactor the code arbitrarily; as long as the original code can achieve the desired functionality, it is sufficient.
- Focus on new features and new architectures; only new things have high value.
Footnotes
-
MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video, 2019. ↩ ↩2
-
Leveraging Bitstream Metadata for Fast and Accurate Video Compression Correction, 2022. ↩
-
Proposal JCTVC-X0038. ↩