Skip to content

Latest commit



436 lines (309 loc) · 13.8 KB

File metadata and controls

436 lines (309 loc) · 13.8 KB

PowerQE V3 Document

Install dependency

Chinese users may first update mirrors:

First create a Conda environment:

conda env create -f environment.yml  # create the pqe env
conda activate pqe

Then install PyTorch v1 + MMCV v1 + MMEditing v0. My code:

# please refer to
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url

pip install mmcv-full -f
# or
#pip install openmim
#mim install mmcv-full

# install the submodule mmediting in the editable mode
pip install -e mmediting

# verify
#cd ~
#python -c "import mmedit; print(mmedit.__version__)"

Prepare data

Raw dataset


Source: homepage.

File tree before compression:

`-- train
`   `-- 0{001,002,...,800}.png
`-- valid
    `-- 0{801,802,...,900}.png


Source: GitHub.

File tree before compression:

`-- 00{0001,0002,...,2650}.png

Flickr2K does not have a division into training and testing sets. During the compression, we create two annotation files for training and testing. The images with filenames 00{0001,...,1988}.png are used for training, while 00{1989,...,2650}.png are reserved for testing. For more details, please refer to the file powerqe/tools/data/


Source: homepage.

Triplet dataset

File tree before compression:

`-- tri_trainlist.txt
`-- tri_testlist.txt
`-- sequences
    `-- 00001
    `   `-- 0001
    `   `   `-- im{1,2,3}.png
    `   `-- 0002
    `   `-- ...
    `   `-- 1000
    `-- 00002
    `-- ...
    `-- 00078

To shorten the validation time during training, we select 42 sequences (matched by 00001/*) in tri_testlist.txt (3782 sequences in total) for validation and construct tri_validlist.txt for validation.

Septuplet dataset

File tree before compression:

`-- sep_trainlist.txt
`-- sep_testlist.txt
`-- sequences
    `-- 00001
    `   `-- 0001
    `   `   `-- im{1,2,...,7}.png
    `   `-- 0002
    `   `-- ...
    `   `-- 1000
    `-- 00002
    `-- ...
    `-- 00096

To shorten the validation time during training, we select 58 sequences (matched by 00001/*) in sep_testlist.txt (7824 sequences in total) for validation and construct sep_validlist.txt for validation.


Source: GitHub.

File tree before compression:

`-- train
`   `-- 11skateboarding-10_1920x1080_87.yuv
`   `-- ...
`-- test
    `-- BQMall_832x480_600.yuv
    `-- ...

Unlike the Vimeo-90K dataset, which consists of images, the MFQEv2 dataset consists of planar files. Consequently, the ground truth (GT) images will be generated from these planar files using the YCbCr 420P to RGB24 conversion during compression.

Moreover, each sequence in the MFQEv2 dataset contains a varying number of images. We generate a maximum of 300 images from each planar file.

The minimum patch size for all video datasets should not be smaller than 256, as it is required by BasicVSR++. We filter out videos from the training set if their height or width is smaller than 256.

Also, we filter out Traffic_2560x1600_150 and PeopleOnStreet_2560x1600_150 for testing since they are too big for most types of GPU.



Better Portable Graphics (BPG) is an image format based on the intra-frame encoding of the High Efficiency Video Coding (HEVC) standard.

Please refer to the official site and the GitHub mirror for instructions. The following is my personal experience of using libbpg v0.9.8.



cd data
git clone --depth=1 libbpg

Modify libbpg/Makefile:

  • Comment USE_X265=y and uncomment USE_JCTVC=y: We want to use JCTVC instead of x265.
  • Comment USE_BPGVIEW=y: We do not need BPGView.


cd data/libbpg
make clean

There may be errors during the build. We need to install the required dependencies according to the error messages. For examples,

sudo apt-get install libpng-dev
sudo apt-get install libjpeg-dev

# build again
make clean

Basic usage:

bpgenc [-q quality] -o bpg_path src_path  # src image -> bpg
bpgdec -o tar_path bpg_path  # bpg -> tar image

Check bpgenc -h and bpgdec -h for detailed usage.

Example: Compress the DIV2K dataset


conda activate pqe &&\
 python tools/data/ --codec bpg --dataset div2k

Resulting file tree:

`-- data
`   `-- libbpg
`   `   `-- bpgdec
`   `   `-- bpgenc
`   `-- {div2k,div2k_lq/bpg/qp37}
`       `-- train
`       `   `-- 0{001,002,...,800}.png
`       `-- valid
`           `-- 0{801,802,...,900}.png
`-- tmp/div2k_lq/bpg/qp37  # can be deleted after compression
    `-- train
    `   `-- 0{001,002,...,800}.bpg
    `-- valid
        `-- 0{801,802,...,900}.bpg


HM is the reference software for High Efficiency Video Coding (HEVC).

Please refer to the official site and the document (e.g., HM/doc/software-manual.pdf) for instructions. The following is my personal experience of using HM 18.0.

cd data
git clone -b HM-18.0 --depth=1 hm18.0

cd hm18.0
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j
Example: Compress the Vimeo-90K triplet dataset
conda activate pqe &&\
 python tools/data/\
 --dataset vimeo-triplet

Resulting file tree:

`-- data
`   `-- hm18.0
`   `   `-- bin/TAppEncoderStatic
`   `   `-- cfg/encoder_lowdelay_P_main.cfg
`   `-- {vimeo_triplet/sequences,vimeo_septuplet_lq/hm18.0/ldp/qp37}
`       `-- 00001
`       `   `-- 0001
`       `   `   `-- im{1,2,3}.png
`       `   `-- 0002
`       `   `-- ...
`       `   `-- 1000
`       `-- 00002
`       `-- ...
`       `-- 00078
`-- tmp  # can be deleted after compression
    `-- vimeo_septuplet_planar
    `   `-- 00001
    `   `   `-- {0001,0002,...,1000}.yuv
    `   `-- 00002
    `   `-- ...
    `   `-- 00078
    `-- vimeo_septuplet_bit/hm18.0/ldp/qp37
    `   `-- 00001
    `   `   `-- {0001,0002,...,1000}.bin
    `   `-- 00002
    `   `-- ...
    `   `-- 00078
    `-- vimeo_septuplet_comp_planar/hm18.0/ldp/qp37
        `-- 00001
        `   `-- {0001,0002,...,1000}.log
        `   `-- {0001,0002,...,1000}.yuv
        `-- 00002
        `-- ...
        `-- 00078



Obtain clean configuration without inheritance

Configuration inheritance is widely used in PowerQE to ensure consistency among configurations.

To obtain clean configuration without inheritance:

conda activate pqe
python -c "from mmcv import Config; cfg = Config.fromfile('configs/arcnn/'); print(cfg.pretty_text)"

Crop image border before evaluation

Due to the padding of upsampling, the error at borders is significant. PowerQE follows the common practice in MMEditing to crop image borders before evaluation.

Test time unfolding

When using test time unfolding, patch-based evaluation is conducted to save memory. The accuracy may also drop.


What are key frames

Some approaches such as MFQEv21 take advantage of the frame-wise quality fluctuation caused by the QP offset. We consider key frames as those with the lowest QP locally. Note that we do not consider PSNR since it also correlates to the content, while QP only correlates to the compression. This strategy is effective for low-delay P configuration, which turns off the rate control and fixes QP.

You can find QP, IntraQPOffset, QPoffset, QPOffsetModelOff, and QPOffsetModelScale in the configuration, i.e., data/hm18.0/cfg/encoder_lowdelay_P_main.cfg. In addition, you can find the frame QPs in log files. How do QP, IntraQPOffset, QPoffset, QPOffsetModelOff, and QPOffsetModelScale determine the final frame QPs?

  • POC 0 (I-SLICE): QP + IntraQPOffset
  • POC 1, 2, ... (P-SLICE): QP + QPoffset + QPOffsetModelOff + QPOffsetModelScale * QP

For example, the QP, IntraQPOffset, QPoffset, QPOffsetModelOff, and QPOffsetModelScale are 37, -1, [5,4,5,4,5,4,5,1], [-6.5,...,-6.5,0], and [0.2590,...,0.2590,0], respectively. Then, QP values for POC 0 to 8 are 36, 45, 44, 45, 44, 45, 44, 45, and 38, respectively. Since a lower QP correlates to higher compression quality, POC 0, 2, 4, 6, and 8 are key frames.

The configuration can vary among different HM versions. In HM 16.5, QP offsets are set to [5,4,5,1] (GOPSize: 4).

You can also find a configuration with GOPSize being 4 for HM 18.0 at data/hm18.0/cfg/misc/encoder_lowdelay_P_main_GOP4.cfg.

Use LMDB for faster IO

LMDB can be effectively utilized to accelerate IO operations, particularly for storing training patches.


  • Improved training speed: By working with small patches instead of larger images, the training process can be significantly expedited.
  • Reduced CPU and GPU resource usage: Processing smaller patches instead of entire images alleviates the burden on both the CPU and GPU, resulting in lower CPU utilization and reduced GPU memory consumption.
  • Universal image format: LMDB allows storing all images, such as PNG, JPG, and others, as PNG format within the LMDB files.
  • Consolidated patch storage: All training patches are conveniently packed into a single LMDB file, facilitating organization and access.


  • Increased memory requirements: Prior to training, the LMDB files need to be loaded into memory, which can result in higher memory usage compared to directly working with images.
  • Additional time, computation, and storage for generating LMDB files.
  • Fixed cropping method: Once the LMDB file is generated, the cropping method employed for extracting patches from images becomes fixed and cannot be easily modified without regenerating the LMDB file.

Run for the DIV2K dataset:

conda activate pqe &&\
 python tools/data/\
 --dataset div2k

Resulting file tree:

`-- data/lmdb
    `-- div2k/train.lmdb
    `-- div2k_lq/bpg/qp37/train.lmdb
`-- tmp/patches  # can be deleted
    `-- div2k/train
    `-- div2k_lq/bpg/qp37/train

For the configuration file with LMDB loading, see configs/arcnn/

Why do we not use x265

x265 is a HEVC video encoder application library. Encoding videos using x265 can be much faster than using HM. x265 has been supported by FFmpeg with libx265. As indicated by this paper2, the following script can generate compressed videos that closely resemble the output of HM:

ffmpeg -video_size <WIDTH>x<HEIGHT>\
 -i <INPUT>\
 -vcodec libx265\
 -qp <QP>\
 -x265-params <OPTIONS>\

for options:


For research purpose, we need to control the distortion. Therefore, it is better to use a specific configuration of a stable version of the reference software. For example, we may use the low-delay P configuration of HM 18.0 to compress videos. In this configuration, many hyperparameters, including the frame-level QP offset, are set. The above script does not indicate these parameters.

The QP offset is proposed to achieve better RD performance3. Some approaches such as MFQEv21 take advantage of the frame-wise quality fluctuation caused by the QP offset.

Use BPG on Mac

First install libbpg via homebrew:

brew install libbpg
#brew list

Then replace the enc_path and dec_path in the script tools/data/


Use pre-commit hook for code check

PowerQE follows MMCV and MMEditing to support the pre-commit hook. Installation:

conda activate pqe
pip install -U pre-commit
pre-commit install

On every commit, linters and formatter will be enforced. You can also run hooks manually:

pre-commit run --all


Markdown heading text anchors

For example, the anchor (ID) of this paragraph is markdown-paragraph-id. One can jump from another Markdown file to this paragraph by docs/

Note that Markdown will convert the heading text to lowercase, remove any non-alphanumeric characters, and replace spaces with hyphens. For example, if we have two paragraphs named Markdown paragraph: ID and Markdown paragraph ID, their corresponding IDs will be #markdown-paragraph-id and #markdown-paragraph-id-1, respectively.

It's worth noting that the exact algorithm for generating the unique identifier can vary between Markdown renderers and may depend on the specific implementation details of the renderer. We may check ID by GitHub or VS Code TOC.


  • Do not refactor the code arbitrarily; as long as the original code can achieve the desired functionality, it is sufficient.
  • Focus on new features and new architectures; only new things have high value.


  1. MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video, 2019. 2

  2. Leveraging Bitstream Metadata for Fast and Accurate Video Compression Correction, 2022.

  3. Proposal JCTVC-X0038.