MinKyu Lee, Sangeek Hyun, Woojin Jun, Jae-Pil Heo*
Sungkyunkwan University
*: Corresponding Author
This is the official Repository of: Auto-Encoded Supervision for Perceptual Image Super-Resolution
β If you find AESOP helpful, please consider giving this repository a star. Thanks! π€
This work tackles the fidelity objective in the perceptual super-resolution (SR). Specifically, we address the shortcomings of pixel-level
$L_\text{p}$ loss ($L_\text{pix}$ ) in the GAN-based SR framework. Since$L_\text{pix}$ is known to have a trade-off relationship against perceptual quality, prior methods often multiply a small scale factor or utilize low-pass filters. However, this work shows that these circumventions fail to address the fundamental factor that induces blurring. Accordingly, we focus on two points: 1) precisely discriminating the subcomponent of$L_\text{pix}$ that contributes to blurring, and 2) only guiding based on the factor that is free from this trade-off relationship. We show that they can be achieved in a surprisingly simple manner, with an Auto-Encoder (AE) pretrained with$L_\text{pix}$ . Accordingly, we propose the Auto-Encoded Supervision for Optimal Penalization loss ($L_\text{AESOP}$ ), a novel loss function that measures distance in the AE space, instead of the raw pixel space. Note that the AE space indicates the space after the decoder, not the bottleneck. By simply substituting$L_\text{pix}$ with$L_\text{AESOP}$ , we can provide effective reconstruction guidance without compromising perceptual quality. Designed for simplicity, our method enables easy integration into existing SR frameworks. Experimental results verify that AESOP can lead to favorable results in the perceptual SR task.
- π 2024-12-04: Repository created.
- π 2025-02-07: Our paper has been accepted to CVPR2025
- β Codes will be updated soon. Stay Tuned! β
(Fig.1) Conceptual illustration of the proposed AESOP loss and the pixel-level
$L_\text{p}$ reconstruction guidance employed in typical perceptual SR methods. (a) Fidelity oriented SR network trained with$L_\text{pix}$ estimates the average over plausible solutions (i.e., the optimal fidelity point). Meanwhile, perceptual SR involves a range of multiple solutions, standing around the optimal fidelity point. Thus, we identify two fundamental components of a perceptual SR image as 1) the perceptual variance factor~(red line), a factor that possesses randomness and contributes to realistic textures, and 2) the fidelity bias term~(orange dot), the residual blurry component of an SR image, contributing to the overall fidelity, apart from the perceptual variance. (b) Typical perceptual SR methods adopt$L_\text{pix}$ for reconstruction guidance, which pushes the perceptual variance factor to vanish. Thus, when combined with perceptual quality oriented losses that encourage this variance factor, conflict arises, leading to suboptimal performance. (c) In contrast,$L_\text{AESOP}$ only penalizes the fidelity bias-induced error, while preserving these critical perceptual variance factors. This ensures improved fidelity without sacrificing perceptual quality.
(Fig.2) Loss map comparison between
$L_\text{AESOP}$ and$L_\text{pix}$ .$L_\text{pix}$ indiscriminately penalizes all factors, including the visually important fine-grained details.$L_\text{AESOP}$ only penalizes based on the fidelity-bias factor.
(Left, Fig.3) Conceptual illustration of the perception-distortion (PD) trade-off curve that
$L_\text{AESOP}$ and$L_\text{pix}$ lead to.
(Right, Fig.4) The optimization procedure of
$L_\text{AESOP}$ and$L_\text{pix}$ . Our$L_\text{AESOP}$ is distribution preserving while$L_\text{pix}$ is not.
AESOP takes the
$L_p$ distance in the Auto-Encoded space (after the decoder, not the bottleneck), instead of the raw pixel space. The carefully designed Auto-Encoder architecture and the pretraining objective enable AESOP to penalize only based on the fidelity bias factors (the Auto-Encoded image shown in Fig.2) while retaining perceptual variance. This way, AESOP improves fidelity without degrading perceptual quality. (Shouldn't any distortion measure lead to degraded perception? --> Refer to Appendix F)
Metric | Method | Set14 | Mg109 | Gen100 | Urb100 | DIV2K | B100 | LSDIR |
---|---|---|---|---|---|---|---|---|
AE-PSNR | ESRGAN | 30.280 | 31.165 | 32.663 | 27.198 | 31.668 | 28.991 | 27.636 |
SPSR | 30.602 | 31.351 | 32.670 | 27.508 | 31.737 | 29.029 | 27.881 | |
LDL | 31.180 | 32.608 | 33.823 | 28.488 | 32.597 | 29.595 | 28.625 | |
AESOP (Ours) | 31.341 | 32.843 | 33.956 | 28.529 | 32.740 | 29.737 | 28.812 | |
LR-PSNR | ESRGAN | 43.892 | 43.908 | 45.259 | 42.879 | 45.689 | 43.823 | 42.718 |
SPSR | 43.835 | 44.359 | 44.656 | 42.666 | 44.717 | 42.719 | 42.364 | |
LDL | 46.497 | 47.603 | 48.184 | 45.975 | 47.793 | 45.307 | 45.295 | |
AESOP (Ours) | 46.625 | 48.188 | 48.653 | 46.280 | 48.272 | 45.837 | 45.571 |
Quantitative comparison between AESOP (Ours) and baseline methods. AE-PSNR and LR-PSNR quantify how well an SR network estimates the fidelity bias. The best results of each group are highlighted in bold.
Backbone | RRDB | SwinIR | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Metrics | Benchmark | ESRGAN | SPSR | LDL | AESOP | AESOP | +GAN | LDL | AESOP* | AESOP |
Recon. Loss | ||||||||||
Patch | 128 | 128 | 128 | 128 | 256 | 256 | 256 | 256 | 256 | |
LPIPS | Set14 | 0.1241 | 0.1207 | 0.1132 | 0.1067 | 0.1053 | 0.1160 | 0.1091 | 0.1023 | 0.1027 |
Manga109 | 0.0649 | 0.0672 | 0.0544 | 0.0525 | 0.0494 | 0.0542 | 0.0469 | 0.0440 | 0.0461 | |
General100 | 0.0879 | 0.0862 | 0.0796 | 0.0784 | 0.0734 | 0.0796 | 0.0740 | 0.0717 | 0.0710 | |
Urban100 | 0.1229 | 0.1184 | 0.1084 | 0.1064 | 0.1033 | 0.1077 | 0.1021 | 0.0961 | 0.0945 | |
DIV2K-val | 0.1154 | 0.1099 | 0.0999 | 0.0977 | 0.0936 | 0.1038 | 0.0944 | 0.0909 | 0.0893 | |
BSD100 | 0.1616 | 0.1609 | 0.1535 | 0.1515 | 0.1443 | - | 0.1572 | 0.1441 | 0.1385 | |
LSDIR | 0.1378 | 0.1312 | 0.1180 | 0.1152 | 0.1123 | - | 0.1132 | 0.1094 | 0.1071 | |
DISTS | Set14 | 0.0951 | 0.0920 | 0.0866 | 0.0852 | 0.0825 | 0.0930 | 0.0869 | 0.0809 | 0.0819 |
Manga109 | 0.0471 | 0.0463 | 0.0355 | 0.0360 | 0.0356 | 0.0365 | 0.0315 | 0.0327 | 0.0328 | |
General100 | 0.0874 | 0.0884 | 0.0801 | 0.0798 | 0.0773 | 0.0835 | 0.0794 | 0.0768 | 0.0762 | |
Urban100 | 0.0880 | 0.0849 | 0.0793 | 0.0793 | 0.0768 | 0.0835 | 0.0800 | 0.0751 | 0.0742 | |
DIV2K-val | 0.0593 | 0.0546 | 0.0526 | 0.0518 | 0.0484 | 0.0531 | 0.0507 | 0.0469 | 0.0459 | |
BSD100 | 0.1165 | 0.1176 | 0.1163 | 0.1117 | 0.1089 | - | 0.1185 | 0.1078 | 0.1072 | |
LSDIR | 0.0764 | 0.0699 | 0.0650 | 0.0641 | 0.0612 | - | 0.0650 | 0.0601 | 0.0591 | |
PSNR | Set14 | 26.594 | 26.860 | 27.228 | 27.361 | 27.246 | 27.282 | 27.526 | 27.822 | 27.421 |
Manga109 | 28.413 | 28.561 | 29.620 | 29.973 | 29.747 | 29.345 | 30.143 | 30.453 | 30.061 | |
General100 | 29.425 | 29.424 | 30.289 | 30.482 | 30.251 | 30.104 | 30.441 | 30.752 | 30.401 | |
Urban100 | 24.365 | 24.804 | 25.459 | 25.630 | 25.541 | 25.736 | 26.231 | 26.398 | 26.148 | |
DIV2K-val | 28.175 | 28.182 | 28.819 | 29.079 | 28.910 | 28.784 | 29.117 | 29.543 | 29.137 | |
BSD100 | 25.313 | 25.501 | 25.954 | 26.080 | 25.904 | - | 26.216 | 26.405 | 25.930 | |
LSDIR | 23.882 | 24.232 | 24.663 | 24.933 | 24.845 | - | 25.129 | 25.419 | 25.038 | |
SSIM | Set14 | 0.7144 | 0.7254 | 0.7358 | 0.7402 | 0.7371 | 0.7407 | 0.7478 | 0.7578 | 0.7438 |
Manga109 | 0.8595 | 0.8590 | 0.8734 | 0.8827 | 0.8802 | 0.8796 | 0.8880 | 0.8949 | 0.8880 | |
General100 | 0.8095 | 0.8091 | 0.8280 | 0.8335 | 0.8269 | 0.8305 | 0.8347 | 0.8415 | 0.8328 | |
Urban100 | 0.7341 | 0.7474 | 0.7661 | 0.7724 | 0.7697 | 0.7786 | 0.7918 | 0.7947 | 0.7884 | |
DIV2K-val | 0.7759 | 0.7720 | 0.7897 | 0.7978 | 0.7951 | 0.7911 | 0.8011 | 0.8121 | 0.8023 | |
BSD100 | 0.6527 | 0.6596 | 0.6813 | 0.6841 | 0.6783 | - | 0.6923 | 0.6982 | 0.6813 | |
LSDIR | 0.6866 | 0.6966 | 0.7117 | 0.7220 | 0.7202 | - | 0.7316 | 0.7397 | 0.7289 |
Quantitative comparison between AESOP (Ours) and baseline methods on standard benchmark datasets. The best results of each group are highlighted in bold. AESOP* indicates only training 200K iterations.
This project is built based on BasicSR and also DRCT, LDL, SwinIR
Please contact me via [email protected] for any inquiries.
Consider citing us if you find our paper useful in your research π
@article{lee2024auto,
title={Auto-Encoded Supervision for Perceptual Image Super-Resolution},
author={Lee, MinKyu and Hyun, Sangeek and Jun, Woojin and Heo, Jae-Pil},
journal={arXiv preprint arXiv:2412.00124},
year={2024}
}