GitHub - 2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution: [CVPR2025] Official Repository for AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution

[CVPR2025] AESOP 🦊🍇: Auto-Encoded Supervision for Perceptual Image Super-Resolution

MinKyu Lee, Sangeek Hyun, Woojin Jun, Jae-Pil Heo*
Sungkyunkwan University
*: Corresponding Author

This is the official Repository of: Auto-Encoded Supervision for Perceptual Image Super-Resolution
⭐ If you find AESOP helpful, please consider giving this repository a star. Thanks! 🤗

This work tackles the fidelity objective in the perceptual super-resolution (SR). Specifically, we address the shortcomings of pixel-level $L_\text{p}$ loss ($L_\text{pix}$) in the GAN-based SR framework. Since $L_\text{pix}$ is known to have a trade-off relationship against perceptual quality, prior methods often multiply a small scale factor or utilize low-pass filters. However, this work shows that these circumventions fail to address the fundamental factor that induces blurring. Accordingly, we focus on two points: 1) precisely discriminating the subcomponent of $L_\text{pix}$ that contributes to blurring, and 2) only guiding based on the factor that is free from this trade-off relationship. We show that they can be achieved in a surprisingly simple manner, with an Auto-Encoder (AE) pretrained with $L_\text{pix}$. Accordingly, we propose the Auto-Encoded Supervision for Optimal Penalization loss ($L_\text{AESOP}$), a novel loss function that measures distance in the AE space, instead of the raw pixel space. Note that the AE space indicates the space after the decoder, not the bottleneck. By simply substituting $L_\text{pix}$ with $L_\text{AESOP}$, we can provide effective reconstruction guidance without compromising perceptual quality. Designed for simplicity, our method enables easy integration into existing SR frameworks. Experimental results verify that AESOP can lead to favorable results in the perceptual SR task.

News

🎉 2024-12-04: Repository created.
🎉 2025-02-07: Our paper has been accepted to CVPR2025
❌ Codes will be updated soon. Stay Tuned! ⭐

Overview

(Fig.1) Conceptual illustration of the proposed AESOP loss and the pixel-level $L_\text{p}$ reconstruction guidance employed in typical perceptual SR methods. (a) Fidelity oriented SR network trained with $L_\text{pix}$ estimates the average over plausible solutions (i.e., the optimal fidelity point). Meanwhile, perceptual SR involves a range of multiple solutions, standing around the optimal fidelity point. Thus, we identify two fundamental components of a perceptual SR image as 1) the perceptual variance factor~(red line), a factor that possesses randomness and contributes to realistic textures, and 2) the fidelity bias term~(orange dot), the residual blurry component of an SR image, contributing to the overall fidelity, apart from the perceptual variance. (b) Typical perceptual SR methods adopt $L_\text{pix}$ for reconstruction guidance, which pushes the perceptual variance factor to vanish. Thus, when combined with perceptual quality oriented losses that encourage this variance factor, conflict arises, leading to suboptimal performance. (c) In contrast, $L_\text{AESOP}$ only penalizes the fidelity bias-induced error, while preserving these critical perceptual variance factors. This ensures improved fidelity without sacrificing perceptual quality.

Intuitions on AESOP

(Fig.2) Loss map comparison between $L_\text{AESOP}$ and $L_\text{pix}$. $L_\text{pix}$ indiscriminately penalizes all factors, including the visually important fine-grained details. $L_\text{AESOP}$ only penalizes based on the fidelity-bias factor.

(Left, Fig.3) Conceptual illustration of the perception-distortion (PD) trade-off curve that $L_\text{AESOP}$ and $L_\text{pix}$ lead to.

(Right, Fig.4) The optimization procedure of $L_\text{AESOP}$ and $L_\text{pix}$. Our $L_\text{AESOP}$ is distribution preserving while $L_\text{pix}$ is not.

AESOP takes the $L_p$ distance in the Auto-Encoded space (after the decoder, not the bottleneck), instead of the raw pixel space. The carefully designed Auto-Encoder architecture and the pretraining objective enable AESOP to penalize only based on the fidelity bias factors (the Auto-Encoded image shown in Fig.2) while retaining perceptual variance. This way, AESOP improves fidelity without degrading perceptual quality. (Shouldn't any distortion measure lead to degraded perception? --> Refer to Appendix F)

Benchmark Results

Metric	Method	Set14	Mg109	Gen100	Urb100	DIV2K	B100	LSDIR
AE-PSNR	ESRGAN	30.280	31.165	32.663	27.198	31.668	28.991	27.636
	SPSR	30.602	31.351	32.670	27.508	31.737	29.029	27.881
	LDL	31.180	32.608	33.823	28.488	32.597	29.595	28.625
	AESOP (Ours)	*31.341*	*32.843*	*33.956*	*28.529*	*32.740*	*29.737*	*28.812*
LR-PSNR	ESRGAN	43.892	43.908	45.259	42.879	45.689	43.823	42.718
	SPSR	43.835	44.359	44.656	42.666	44.717	42.719	42.364
	LDL	46.497	47.603	48.184	45.975	47.793	45.307	45.295
	AESOP (Ours)	*46.625*	*48.188*	*48.653*	*46.280*	*48.272*	*45.837*	*45.571*

Quantitative comparison between AESOP (Ours) and baseline methods. AE-PSNR and LR-PSNR quantify how well an SR network estimates the fidelity bias. The best results of each group are highlighted in bold.

	Backbone	RRDB					SwinIR
Metrics	Benchmark	ESRGAN	SPSR	LDL	AESOP	AESOP	+GAN	LDL	AESOP*	AESOP
Recon. Loss		$L_\text{pix}$	$L_\text{pix}$	$L_\text{pix}$	$L_\text{AESOP}$	$L_\text{AESOP}$	$L_\text{pix}$	$L_\text{pix}$	$L_\text{AESOP}$	$L_\text{AESOP}$
Patch		128	128	128	128	256	256	256	256	256
LPIPS	Set14	0.1241	0.1207	0.1132	*0.1067*	0.1053	0.1160	0.1091	*0.1023*	0.1027
	Manga109	0.0649	0.0672	0.0544	*0.0525*	0.0494	0.0542	0.0469	*0.0440*	0.0461
	General100	0.0879	0.0862	0.0796	*0.0784*	0.0734	0.0796	0.0740	*0.0717*	0.0710
	Urban100	0.1229	0.1184	0.1084	*0.1064*	0.1033	0.1077	0.1021	*0.0961*	0.0945
	DIV2K-val	0.1154	0.1099	0.0999	*0.0977*	0.0936	0.1038	0.0944	*0.0909*	0.0893
	BSD100	0.1616	0.1609	0.1535	*0.1515*	0.1443	-	0.1572	*0.1441*	0.1385
	LSDIR	0.1378	0.1312	0.1180	*0.1152*	0.1123	-	0.1132	*0.1094*	0.1071
DISTS	Set14	0.0951	0.0920	0.0866	*0.0852*	0.0825	0.0930	0.0869	*0.0809*	0.0819
	Manga109	0.0471	0.0463	*0.0355*	0.0360	0.0356	0.0365	*0.0315*	0.0327	0.0328
	General100	0.0874	0.0884	0.0801	*0.0798*	0.0773	0.0835	0.0794	*0.0768*	0.0762
	Urban100	0.0880	0.0849	*0.0793*	*0.0793*	0.0768	0.0835	0.0800	*0.0751*	0.0742
	DIV2K-val	0.0593	0.0546	0.0526	*0.0518*	0.0484	0.0531	0.0507	*0.0469*	0.0459
	BSD100	0.1165	0.1176	0.1163	*0.1117*	0.1089	-	0.1185	*0.1078*	0.1072
	LSDIR	0.0764	0.0699	0.0650	*0.0641*	0.0612	-	0.0650	*0.0601*	0.0591
PSNR	Set14	26.594	26.860	27.228	*27.361*	27.246	27.282	27.526	*27.822*	27.421
	Manga109	28.413	28.561	29.620	*29.973*	29.747	29.345	30.143	*30.453*	30.061
	General100	29.425	29.424	30.289	*30.482*	30.251	30.104	30.441	*30.752*	30.401
	Urban100	24.365	24.804	25.459	*25.630*	25.541	25.736	26.231	*26.398*	26.148
	DIV2K-val	28.175	28.182	28.819	*29.079*	28.910	28.784	29.117	*29.543*	29.137
	BSD100	25.313	25.501	25.954	*26.080*	25.904	-	26.216	*26.405*	25.930
	LSDIR	23.882	24.232	24.663	*24.933*	24.845	-	25.129	*25.419*	25.038
SSIM	Set14	0.7144	0.7254	0.7358	*0.7402*	0.7371	0.7407	0.7478	*0.7578*	0.7438
	Manga109	0.8595	0.8590	0.8734	*0.8827*	0.8802	0.8796	0.8880	*0.8949*	0.8880
	General100	0.8095	0.8091	0.8280	*0.8335*	0.8269	0.8305	0.8347	*0.8415*	0.8328
	Urban100	0.7341	0.7474	0.7661	*0.7724*	0.7697	0.7786	0.7918	*0.7947*	0.7884
	DIV2K-val	0.7759	0.7720	0.7897	*0.7978*	0.7951	0.7911	0.8011	*0.8121*	0.8023
	BSD100	0.6527	0.6596	0.6813	*0.6841*	0.6783	-	0.6923	*0.6982*	0.6813
	LSDIR	0.6866	0.6966	0.7117	*0.7220*	0.7202	-	0.7316	*0.7397*	0.7289

Quantitative comparison between AESOP (Ours) and baseline methods on standard benchmark datasets. The best results of each group are highlighted in bold. AESOP* indicates only training 200K iterations.

Acknowledgement

This project is built based on BasicSR and also DRCT, LDL, SwinIR

Contact

Please contact me via [email protected] for any inquiries.

Citation

Consider citing us if you find our paper useful in your research 😄

@article{lee2024auto,
  title={Auto-Encoded Supervision for Perceptual Image Super-Resolution},
  author={Lee, MinKyu and Hyun, Sangeek and Jun, Woojin and Heo, Jae-Pil},
  journal={arXiv preprint arXiv:2412.00124},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR2025] AESOP 🦊🍇: Auto-Encoded Supervision for Perceptual Image Super-Resolution

News

Overview

Intuitions on AESOP

Benchmark Results

Acknowledgement

Contact

Citation

About

Releases

Packages

2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution

Folders and files

Latest commit

History

Repository files navigation

[CVPR2025] AESOP 🦊🍇: Auto-Encoded Supervision for Perceptual Image Super-Resolution

News

Overview

Intuitions on AESOP

Benchmark Results

Acknowledgement

Contact

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages