Skip to content

Releases: Phhofm/models

4xSPAN_pretrains

10 Apr 15:08
5f0ac00
Compare
Choose a tag to compare

Neosr's latest update from yesterday included a new adaptation of the multi-scale ssim loss.
This was an experiment to test out the difference between making a SPAN pretrain with pixel loss with L1 criteria (as often used in research) vs mssim loss as its only loss.
Models are provided so they can be used for tests or also used as a pretrain for another SPAN model.


4xpix_span_pretrain

Scale: 4
Architecture: SPAN

Author: Philip Hofmann
License: CC-BY-4.0
Purpose: Pretrain
Subject: Realistic, Anime
Date: 10.04.2024

Dataset: nomos_uni
Dataset Size: 2989
OTF (on the fly augmentations): No
Pretrained Model: None
Iterations: 80'000
Batch Size: 12
GT Size: 128

Description: 4x SPAN pretrain trained on pixel loss with L1 criteria (as often used in research) on downsampled nomos_uni dataset using kim's dataset destroyer with down_up,linear,cubic_mitchell,lanczos,gauss,box (while down_up used the same and with range = 0.15,1.5).
The new augmentations except CutBlur have also been used (since CutBlur is meant to be applied to real-world SR and may cause undesired effects if applied to bicubic-only).
Config and training log provided for more details.


4xmssim_span_pretrain

Scale: 4
Architecture: SPAN

Author: Philip Hofmann
License: CC-BY-4.0
Purpose: Pretrain
Subject: Realistic, Anime
Date: 10.04.2024

Dataset: nomos_uni
Dataset Size: 2989
OTF (on the fly augmentations): No
Pretrained Model: None
Iterations: 80'000
Batch Size: 12
GT Size: 128

Description: 4x SPAN pretrain trained on neosr's new adaptation of the multi-scale ssim loss from yesterdays update on downsampled nomos_uni dataset using kim's dataset destroyer with down_up,linear,cubic_mitchell,lanczos,gauss,box (while down_up used the same and with range = 0.15,1.5).
The new augmentations except CutBlur have also been used (since CutBlur is meant to be applied to real-world SR and may cause undesired effects if applied to bicubic-only).
Config and training log provided for more details.


Showcase:
7 Slowpics Examples

Example1
Example2
Example3
Example4
Example5
Example6
Example7

4xHFA2k_VCISR_GRLGAN_ep200

09 Apr 13:43
5f0ac00
Compare
Choose a tag to compare

Name: 4xHFA2k_VCISR_GRLGAN_ep200
Release Date: 04.01.2024
Author: Philip Hofmann
License: CC BY 4.0
Network: GRL
Scale: 4
Purpose: 4x anime upscaler handling video compression artifacts, trained for 200 epochs
Iterations: 85959
epoch: 200
batch_size: 6
HR_size: 128
Dataset: hfa2k
Number of train images: 2568
OTF Training: Yes
Pretrained_Model_G: None

Description:
4x anime upscaler handling video compression artifacts since trained with otf degradations for "mpeg2video", "libxvid", "libx264", "libx265" with crf 20-32, mpeg bitrate 3800-5800 (together with the standard Real-ESRGAN otf pipeline). A faster arch using this otf degradation pipeline would be great for handling video compression artifacts. Since this one is a GRL model and therefore slow, as noted by the dev maybe more for research purposes (or more for single images/screenshots). Trained using VCISR for 200 epochs.

"This is epoch 200 and the start iteration is 85959 with learning rate 2.5e-05"

Slow Pics examples:
h264_crf28
ludvae1
ludvae2

Example1
Example2
Example3

2xNomosUni_compact_otf_medium

09 Apr 13:34
5f0ac00
Compare
Choose a tag to compare

Name: 2xNomosUni_compact_otf_medium
Author: Philip Hofmann
Release Date: 11.01.2024
License: CC BY 4.0
Network: SRVGGNetCompact
Scale: 2
Purpose: 2x fast universal upscaler with medium degradation handling (jpg compression, noise, blur)
Iterations: 276'000
epoch: 218
batch_size: 12
HR_size: 128
Dataset: nomosuni
Number of train images: 2989
OTF Training: Yes
Pretrained_Model_G: 2xNomosUni_compact_otf_strong

Description:
2x compact fast universal upscaler with medium degradation handling using the Real-ESRGAN training pipeline, based off 2xNomosUni_compact_otf_strong. Handles jpg compression, some noise, and some blur (so dejpgs, denoises and deblurs).

Examples:
RealPhoto
Noisy

Example1
Example2

2xNomosUni_compact_multijpg

15 Apr 12:53
5f0ac00
Compare
Choose a tag to compare

Name: 2xNomosUni_compact_multijpg
Author: Philip Hofmann
Release Date: 13.12.2023
License: CC BY 4.0
Network: Compact (SRVGGNet)
Scale: 2
Purpose: 2x fast universal upscaler
Iterations: 30'000
epoch: 17
batch_size: 9
HR_size: 512
Dataset: nomosuni
Number of train images: 2989
OTF Training: No
Pretrained_Model_G: 2x-Compact-Pretrain

Description:
2x compact fast universal upscaler pair trained with jpg degradation (down to 40) and multiscale (down_up, bicubic, bilinear, box, nearest, lanczos).
2xNomosUni_compact_multijpg
2xNomosUni_compact_multijpg

2xHFA2kShallowESRGAN

09 Apr 14:24
5f0ac00
Compare
Choose a tag to compare

Name: 2xHFA2kShallowESRGAN
Author: Philip Hofmann
Release Date: 04.01.2024
License: CC BY 4.0
Network: Shallow ESRGAN (6 Blocks)
Scale: 2
Purpose: 2x anime upscaler
Iterations: 180'000
epoch: 167
batch_size: 12
HR_size: 128
Dataset: hfa2k
Number of train images: 2568
OTF Training: Yes
Pretrained_Model_G: None

Description:
2x shallow esrgan version of the HFA2kCompact model.
This model should be usable with FAST_Anime_VSR using TensorRT for fast inference, as should my 2xHFA2kReal-CUGAN model.

Slow Pics examples:
Example 1
Example 2
Ludvae1
Ludvae2

Example1
Example2
Example3
Example4

All my self-trained sisr Models

07 Apr 18:48
0297704
Compare
Choose a tag to compare

I provide a python script that concurrently downloads all my (latest) released models into a specified folder, skipping already existing model files in that folder.

The script updates itself on execution, so the user is guaranteed to receive all my latest models.

The user can specify as an input which types of models he wants to download/sync.
Currently the script will download my models as either pth, safetensors or fp32 onnx files, depending on the users choice. Pth files is the recommended option since it will always be the most complete (contain all my models), since that is the output file generated by the training software. Conversions are only partially available.

If there is a demand, i might be able to additionally provide fp16 onnx, fp32 ncnn and fp16 ncnn options in this script.

phhofm_sisr_models_download_script_demo

Previously this release contained (compressed) archive files, but this was superseded by this python script. Not only does the script guarantee the user to receive all my (latest) released models, it is also way simpler for me to update by adding a few lines of code in case of a new release, instead of (re)packing whole archives. It also omits a lot of redundancy (files release duplication plus packing and unpacking archives). Its just a better solution overall. This rework happened thanks to przemoc's input of the current state being suboptimal so I came up with a better solution.

An example use case for this is could be comparing all my models on the same input image. So this script can be used to download all my models into a folder, then chaiNNer can be used to iterate through this models folder, upscaling the image with all my models, and then visually inspecting all the outputs, to find the best model for that image or images of similiar style/type.

image

Ludvae200

25 Mar 11:20
0297704
Compare
Choose a tag to compare

Name: Ludvae200
License: CC BY 4.0
Author: Philip Hofmann
Network: LUD-VAE
Scale: 1
Release Date: 25.03.2024
Purpose: 1x realistic noise degradation model
Iterations: 190'000
H_size: 64
n_channels: 3
dataloader_batch_size: 16
H_noise_level: 8
L_noise_level: 3
Dataset: RealLR200
Number of train images: 200
OTF Training: No
Pretrained_Model_G: None

Description:
1x realistic noise degradation model, trained on the RealLR200 dataset as found released on the SeeSR github repo.
Next to the ludvae200.pth model file, I provide a ludvae200.zip file which not only contains the code but also an inference script to run this model on the dataset of your choice.
Adapt the ludvae200_inference.py script accordingly by adjusting the file paths at the beginning section, to your input folder, output folder, the folder path holding the ludvae200.pth model, and a folder path where you want the text file to be generated. I made the textfile generation the same way as I did in Kim's Dataset Destroyer, which means you will have each image file logged with each of the values used to degrade that specific image file in the resulting text file, which will append only and never overwrite.

You can also adjust the strength settings inside the inference script file to fit to your needs. If you in general want less strong noise for example, you should adjust the temperature upper limit from 0.4 to 0.2 or go even lower.
So in line 96 change "temperature_strength = uniform(0.1,0.4)" to "temperature_strength = uniform(0.1,0.2)" just to give an example.

These values are defaulted to my needs of my last dataset degradation workflow I used, but feel free to adjust these values. You can also do the same as I did, temporarily using deterministic values with multiple runs to determine the min and max values of noise generation you deem suitable for your dataset needs.

An example of what this looked like for my last dataset workflow I used my model in:

Determining min and max values. Min value here is noise 1 temperature 0.1 which leads to visibly discernible noise, while max is simply the maximum degree of noise I would want my upscaling model trained on this dataset be able to handle from an input:

Ludvae200_range

Then simply three examples of what these settings will produce:

Ludvae200_example1
Ludvae200_example
Ludvae200_example2

4xRealWebPhoto_v4_dat2

04 Apr 16:58
0297704
Compare
Choose a tag to compare

4xRealWebPhoto_v4_dat2

Scale: 4
Architecture: DAT

Author: Philip Hofmann
License: CC-BY-4.0
Purpose: Compression Removal, Deblur, Denoise, JPEG, WEBP, Restoration
Subject: Photography
Input Type: Images
Date: 04.04.2024

Architecture Option: DAT-2
I/O Channels: 3(RGB)->3(RGB)

Dataset: Nomos8k
Dataset Size: 8492
OTF (on the fly augmentations): No
Pretrained Model: DAT_2_x4
Iterations: 243'000
Batch Size: 4-6
GT Size: 128-256

Description: 4x Upscaling Model for Photos from the Web. The dataset consists of only downscaled photos (to handle good quality), downscaled and compressed photos (uploaded to the web and compressed by service provider), and downscale, compressed, rescaled, recompressed photos (downloaded from the web and re-uploaded to the web).

Applied lens blur, realistic noise with my ludvae200 model, JPG and WEBP compression (40-95), and down_up, linear, cubic_mitchell, lanczos, gaussian and box downsampling algorithms. For details on the degradation process, check out the pdf with its explanations and visualizations.

This is basically a dat2 version of my previous 4xRealWebPhoto_v3_atd model, but trained with a bit stronger noise values, and also a single image per variant so drastically reduced training dataset size.

Showcase:
12 Slowpics Examples
Example1
Example2
Example3
Example4
Example5
Example6
Example7
Example8
Example9
Example10
Example11
Example12

4xRealWebPhoto_v3_atd

22 Mar 14:17
0297704
Compare
Choose a tag to compare

Name: 4xRealWebPhoto_v3_atd
License: CC BY 4.0
Author: Philip Hofmann
Network: ATD
Scale: 4
Release Date: 22.03.2024
Purpose: 4x upscaler for photos downloaded from the web
Iterations: 250'000
epoch: 10
batch_size: 6, 3
HR_size: 128, 192
Dataset: 4xRealWebPhoto_v3
Number of train images: 101'904
OTF Training: No
Pretrained_Model_G: 003_ATD_SRx4_finetune

Description:
4x real web photo upscaler, meant for upscaling photos downloaded from the web. Trained on my v3 of my 4xRealWebPhoto dataset, it should be able to handle noise, jpg and webp (re)compression, (re)scaling, and just a little bit of lens blur, while also be able to handle good quality input. Trained on the very recently released (~2 weeks ago) Adaptive-Token-Dictionary network.

My 4xRealWebPhoto dataset tried to simulate the use-case of a photo being uploaded to the web and being processed by the service provides (like on a social media platform) so compression/downscaling, then maybe being downloaded and re-uploaded by another used where it, again, were processed by the service provider. I included different variants in the dataset. The pdf with info to the v2 dataset can be found here, while i simply included whats different in the v3 png:

4xRealWebPhoto_v3

Training details:
AdamW optimizer with U-Net SN discriminator and BFloat16.
Degraded with otf jpg compression down to 40, re-compression down to 40, together with resizes and the blur kernels.
Losses: PixelLoss using CHC (Clipped Huber with Cosine Similarity Loss), PerceptualLoss using Huber, GANLoss, LDL using Huber, Focal Frequency, Gradient Variance with Huber, YCbCr Color Loss (bt601) and Luma Loss (CIE XYZ) on neosr with norm: true.

11 Examples:
Slowpics

4xRealWebPhoto_v3_atd_example1
4xRealWebPhoto_v3_atd_example2
4xRealWebPhoto_v3_atd_example3
4xRealWebPhoto_v3_atd_example4
4xRealWebPhoto_v3_atd_example5
4xRealWebPhoto_v3_atd_example6
4xRealWebPhoto_v3_atd_example7
Example8

4xRealWebPhoto_v2_rgt_s

18 Mar 20:41
0297704
Compare
Choose a tag to compare

I will probably start releasing each of my trained models here as an individual github release entry so model files are in releases, with a stable link (have some catching up to do)

Name: 4xRealWebPhoto_v2_rgt_s
License: CC BY 4.0
Author: Philip Hofmann
Network: RGT
Network Option: RGT-S
Scale: 4
Release Date: 10.03.2024
Purpose: 4x real web photo upscaler, meant for upscaling photos downloaded from the web
Iterations: 220'000
epoch: 5
batch_size: 16
HR_size: 128
Dataset: 4xRealWebPhoto_v2 (see details in attached pdf file)
Number of train images: 1'086'976 (or 543'488 pairs)
OTF Training: No
Pretrained_Model_G: RGT_S_x4

Description:
4x real web photo upscaler, meant for upscaling photos downloaded from the web. Trained on my v2 of my 4xRealWebPhoto dataset, it should be able to handle realistic noise, jpg and webp compression and re-compression, scaling and rescaling with multiple downscampling algos, and handle a little bit of lens blur.

Thought featuring degraded images in the examples, this model should also be able to handle good quality input.

Details about the approach/dataset I made to train this model (and therefore also what this model would be capable of handling) is in the attached pdf.

My previous tries of this dataset, meaning v0 and v1, will get a separate entry, though this version would be recommended over them.

12 Examples on Slowpics

4xRealWebPhoto_v2_rgt_s_ex1
4xRealWebPhoto_v2_rgt_s_ex2
4xRealWebPhoto_v2_rgt_s_ex3
4xRealWebPhoto_v2_rgt_s_ex4