Skip to content

[CVPR2025] Official Repository for AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution

Notifications You must be signed in to change notification settings

2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 

Repository files navigation

[CVPR2025] AESOP πŸ¦ŠπŸ‡: Auto-Encoded Supervision for Perceptual Image Super-Resolution

PWC

MinKyu Lee, Sangeek Hyun, Woojin Jun, Jae-Pil Heo*
Sungkyunkwan University
*: Corresponding Author

This is the official Repository of: Auto-Encoded Supervision for Perceptual Image Super-Resolution
⭐ If you find AESOP helpful, please consider giving this repository a star. Thanks! πŸ€—

This work tackles the fidelity objective in the perceptual super-resolution (SR). Specifically, we address the shortcomings of pixel-level $L_\text{p}$ loss ($L_\text{pix}$) in the GAN-based SR framework. Since $L_\text{pix}$ is known to have a trade-off relationship against perceptual quality, prior methods often multiply a small scale factor or utilize low-pass filters. However, this work shows that these circumventions fail to address the fundamental factor that induces blurring. Accordingly, we focus on two points: 1) precisely discriminating the subcomponent of $L_\text{pix}$ that contributes to blurring, and 2) only guiding based on the factor that is free from this trade-off relationship. We show that they can be achieved in a surprisingly simple manner, with an Auto-Encoder (AE) pretrained with $L_\text{pix}$. Accordingly, we propose the Auto-Encoded Supervision for Optimal Penalization loss ($L_\text{AESOP}$), a novel loss function that measures distance in the AE space, instead of the raw pixel space. Note that the AE space indicates the space after the decoder, not the bottleneck. By simply substituting $L_\text{pix}$ with $L_\text{AESOP}$, we can provide effective reconstruction guidance without compromising perceptual quality. Designed for simplicity, our method enables easy integration into existing SR frameworks. Experimental results verify that AESOP can lead to favorable results in the perceptual SR task.


News

  • πŸŽ‰ 2024-12-04: Repository created.
  • πŸŽ‰ 2025-02-07: Our paper has been accepted to CVPR2025
  • ❌ Codes will be updated soon. Stay Tuned! ⭐


Overview

(Fig.1) Conceptual illustration of the proposed AESOP loss and the pixel-level $L_\text{p}$ reconstruction guidance employed in typical perceptual SR methods. (a) Fidelity oriented SR network trained with $L_\text{pix}$ estimates the average over plausible solutions (i.e., the optimal fidelity point). Meanwhile, perceptual SR involves a range of multiple solutions, standing around the optimal fidelity point. Thus, we identify two fundamental components of a perceptual SR image as 1) the perceptual variance factor~(red line), a factor that possesses randomness and contributes to realistic textures, and 2) the fidelity bias term~(orange dot), the residual blurry component of an SR image, contributing to the overall fidelity, apart from the perceptual variance. (b) Typical perceptual SR methods adopt $L_\text{pix}$ for reconstruction guidance, which pushes the perceptual variance factor to vanish. Thus, when combined with perceptual quality oriented losses that encourage this variance factor, conflict arises, leading to suboptimal performance. (c) In contrast, $L_\text{AESOP}$ only penalizes the fidelity bias-induced error, while preserving these critical perceptual variance factors. This ensures improved fidelity without sacrificing perceptual quality.


Intuitions on AESOP

(Fig.2) Loss map comparison between $L_\text{AESOP}$ and $L_\text{pix}$. $L_\text{pix}$ indiscriminately penalizes all factors, including the visually important fine-grained details. $L_\text{AESOP}$ only penalizes based on the fidelity-bias factor.

(Left, Fig.3) Conceptual illustration of the perception-distortion (PD) trade-off curve that $L_\text{AESOP}$ and $L_\text{pix}$ lead to.

(Right, Fig.4) The optimization procedure of $L_\text{AESOP}$ and $L_\text{pix}$. Our $L_\text{AESOP}$ is distribution preserving while $L_\text{pix}$ is not.

AESOP takes the $L_p$ distance in the Auto-Encoded space (after the decoder, not the bottleneck), instead of the raw pixel space. The carefully designed Auto-Encoder architecture and the pretraining objective enable AESOP to penalize only based on the fidelity bias factors (the Auto-Encoded image shown in Fig.2) while retaining perceptual variance. This way, AESOP improves fidelity without degrading perceptual quality. (Shouldn't any distortion measure lead to degraded perception? --> Refer to Appendix F)

Benchmark Results

Metric Method Set14 Mg109 Gen100 Urb100 DIV2K B100 LSDIR
AE-PSNR ESRGAN 30.280 31.165 32.663 27.198 31.668 28.991 27.636
SPSR 30.602 31.351 32.670 27.508 31.737 29.029 27.881
LDL 31.180 32.608 33.823 28.488 32.597 29.595 28.625
AESOP (Ours) 31.341 32.843 33.956 28.529 32.740 29.737 28.812
LR-PSNR ESRGAN 43.892 43.908 45.259 42.879 45.689 43.823 42.718
SPSR 43.835 44.359 44.656 42.666 44.717 42.719 42.364
LDL 46.497 47.603 48.184 45.975 47.793 45.307 45.295
AESOP (Ours) 46.625 48.188 48.653 46.280 48.272 45.837 45.571

Quantitative comparison between AESOP (Ours) and baseline methods. AE-PSNR and LR-PSNR quantify how well an SR network estimates the fidelity bias. The best results of each group are highlighted in bold.


Backbone RRDB SwinIR
Metrics Benchmark ESRGAN SPSR LDL AESOP AESOP +GAN LDL AESOP* AESOP
Recon. Loss $L_\text{pix}$ $L_\text{pix}$ $L_\text{pix}$ $L_\text{AESOP}$ $L_\text{AESOP}$ $L_\text{pix}$ $L_\text{pix}$ $L_\text{AESOP}$ $L_\text{AESOP}$
Patch 128 128 128 128 256 256 256 256 256
LPIPS Set14 0.1241 0.1207 0.1132 0.1067 0.1053 0.1160 0.1091 0.1023 0.1027
Manga109 0.0649 0.0672 0.0544 0.0525 0.0494 0.0542 0.0469 0.0440 0.0461
General100 0.0879 0.0862 0.0796 0.0784 0.0734 0.0796 0.0740 0.0717 0.0710
Urban100 0.1229 0.1184 0.1084 0.1064 0.1033 0.1077 0.1021 0.0961 0.0945
DIV2K-val 0.1154 0.1099 0.0999 0.0977 0.0936 0.1038 0.0944 0.0909 0.0893
BSD100 0.1616 0.1609 0.1535 0.1515 0.1443 - 0.1572 0.1441 0.1385
LSDIR 0.1378 0.1312 0.1180 0.1152 0.1123 - 0.1132 0.1094 0.1071
DISTS Set14 0.0951 0.0920 0.0866 0.0852 0.0825 0.0930 0.0869 0.0809 0.0819
Manga109 0.0471 0.0463 0.0355 0.0360 0.0356 0.0365 0.0315 0.0327 0.0328
General100 0.0874 0.0884 0.0801 0.0798 0.0773 0.0835 0.0794 0.0768 0.0762
Urban100 0.0880 0.0849 0.0793 0.0793 0.0768 0.0835 0.0800 0.0751 0.0742
DIV2K-val 0.0593 0.0546 0.0526 0.0518 0.0484 0.0531 0.0507 0.0469 0.0459
BSD100 0.1165 0.1176 0.1163 0.1117 0.1089 - 0.1185 0.1078 0.1072
LSDIR 0.0764 0.0699 0.0650 0.0641 0.0612 - 0.0650 0.0601 0.0591
PSNR Set14 26.594 26.860 27.228 27.361 27.246 27.282 27.526 27.822 27.421
Manga109 28.413 28.561 29.620 29.973 29.747 29.345 30.143 30.453 30.061
General100 29.425 29.424 30.289 30.482 30.251 30.104 30.441 30.752 30.401
Urban100 24.365 24.804 25.459 25.630 25.541 25.736 26.231 26.398 26.148
DIV2K-val 28.175 28.182 28.819 29.079 28.910 28.784 29.117 29.543 29.137
BSD100 25.313 25.501 25.954 26.080 25.904 - 26.216 26.405 25.930
LSDIR 23.882 24.232 24.663 24.933 24.845 - 25.129 25.419 25.038
SSIM Set14 0.7144 0.7254 0.7358 0.7402 0.7371 0.7407 0.7478 0.7578 0.7438
Manga109 0.8595 0.8590 0.8734 0.8827 0.8802 0.8796 0.8880 0.8949 0.8880
General100 0.8095 0.8091 0.8280 0.8335 0.8269 0.8305 0.8347 0.8415 0.8328
Urban100 0.7341 0.7474 0.7661 0.7724 0.7697 0.7786 0.7918 0.7947 0.7884
DIV2K-val 0.7759 0.7720 0.7897 0.7978 0.7951 0.7911 0.8011 0.8121 0.8023
BSD100 0.6527 0.6596 0.6813 0.6841 0.6783 - 0.6923 0.6982 0.6813
LSDIR 0.6866 0.6966 0.7117 0.7220 0.7202 - 0.7316 0.7397 0.7289

Quantitative comparison between AESOP (Ours) and baseline methods on standard benchmark datasets. The best results of each group are highlighted in bold. AESOP* indicates only training 200K iterations.

Acknowledgement

This project is built based on BasicSR and also DRCT, LDL, SwinIR

Contact

Please contact me via [email protected] for any inquiries.

Citation

Consider citing us if you find our paper useful in your research πŸ˜„

@article{lee2024auto,
  title={Auto-Encoded Supervision for Perceptual Image Super-Resolution},
  author={Lee, MinKyu and Hyun, Sangeek and Jun, Woojin and Heo, Jae-Pil},
  journal={arXiv preprint arXiv:2412.00124},
  year={2024}
}

About

[CVPR2025] Official Repository for AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published