Releases: Phhofm/models
4xSPAN_pretrains
Neosr's latest update from yesterday included a new adaptation of the multi-scale ssim loss.
This was an experiment to test out the difference between making a SPAN pretrain with pixel loss with L1 criteria (as often used in research) vs mssim loss as its only loss.
Models are provided so they can be used for tests or also used as a pretrain for another SPAN model.
4xpix_span_pretrain
Scale: 4
Architecture: SPAN
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: Pretrain
Subject: Realistic, Anime
Date: 10.04.2024
Dataset: nomos_uni
Dataset Size: 2989
OTF (on the fly augmentations): No
Pretrained Model: None
Iterations: 80'000
Batch Size: 12
GT Size: 128
Description: 4x SPAN pretrain trained on pixel loss with L1 criteria (as often used in research) on downsampled nomos_uni dataset using kim's dataset destroyer with down_up,linear,cubic_mitchell,lanczos,gauss,box (while down_up used the same and with range = 0.15,1.5).
The new augmentations except CutBlur have also been used (since CutBlur is meant to be applied to real-world SR and may cause undesired effects if applied to bicubic-only).
Config and training log provided for more details.
4xmssim_span_pretrain
Scale: 4
Architecture: SPAN
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: Pretrain
Subject: Realistic, Anime
Date: 10.04.2024
Dataset: nomos_uni
Dataset Size: 2989
OTF (on the fly augmentations): No
Pretrained Model: None
Iterations: 80'000
Batch Size: 12
GT Size: 128
Description: 4x SPAN pretrain trained on neosr's new adaptation of the multi-scale ssim loss from yesterdays update on downsampled nomos_uni dataset using kim's dataset destroyer with down_up,linear,cubic_mitchell,lanczos,gauss,box (while down_up used the same and with range = 0.15,1.5).
The new augmentations except CutBlur have also been used (since CutBlur is meant to be applied to real-world SR and may cause undesired effects if applied to bicubic-only).
Config and training log provided for more details.
Showcase:
7 Slowpics Examples
4xHFA2k_VCISR_GRLGAN_ep200
Name: 4xHFA2k_VCISR_GRLGAN_ep200
Release Date: 04.01.2024
Author: Philip Hofmann
License: CC BY 4.0
Network: GRL
Scale: 4
Purpose: 4x anime upscaler handling video compression artifacts, trained for 200 epochs
Iterations: 85959
epoch: 200
batch_size: 6
HR_size: 128
Dataset: hfa2k
Number of train images: 2568
OTF Training: Yes
Pretrained_Model_G: None
Description:
4x anime upscaler handling video compression artifacts since trained with otf degradations for "mpeg2video", "libxvid", "libx264", "libx265" with crf 20-32, mpeg bitrate 3800-5800 (together with the standard Real-ESRGAN otf pipeline). A faster arch using this otf degradation pipeline would be great for handling video compression artifacts. Since this one is a GRL model and therefore slow, as noted by the dev maybe more for research purposes (or more for single images/screenshots). Trained using VCISR for 200 epochs.
"This is epoch 200 and the start iteration is 85959 with learning rate 2.5e-05"
Slow Pics examples:
h264_crf28
ludvae1
ludvae2
2xNomosUni_compact_otf_medium
Name: 2xNomosUni_compact_otf_medium
Author: Philip Hofmann
Release Date: 11.01.2024
License: CC BY 4.0
Network: SRVGGNetCompact
Scale: 2
Purpose: 2x fast universal upscaler with medium degradation handling (jpg compression, noise, blur)
Iterations: 276'000
epoch: 218
batch_size: 12
HR_size: 128
Dataset: nomosuni
Number of train images: 2989
OTF Training: Yes
Pretrained_Model_G: 2xNomosUni_compact_otf_strong
Description:
2x compact fast universal upscaler with medium degradation handling using the Real-ESRGAN training pipeline, based off 2xNomosUni_compact_otf_strong. Handles jpg compression, some noise, and some blur (so dejpgs, denoises and deblurs).
2xNomosUni_compact_multijpg
Name: 2xNomosUni_compact_multijpg
Author: Philip Hofmann
Release Date: 13.12.2023
License: CC BY 4.0
Network: Compact (SRVGGNet)
Scale: 2
Purpose: 2x fast universal upscaler
Iterations: 30'000
epoch: 17
batch_size: 9
HR_size: 512
Dataset: nomosuni
Number of train images: 2989
OTF Training: No
Pretrained_Model_G: 2x-Compact-Pretrain
Description:
2x compact fast universal upscaler pair trained with jpg degradation (down to 40) and multiscale (down_up, bicubic, bilinear, box, nearest, lanczos).
2xHFA2kShallowESRGAN
Name: 2xHFA2kShallowESRGAN
Author: Philip Hofmann
Release Date: 04.01.2024
License: CC BY 4.0
Network: Shallow ESRGAN (6 Blocks)
Scale: 2
Purpose: 2x anime upscaler
Iterations: 180'000
epoch: 167
batch_size: 12
HR_size: 128
Dataset: hfa2k
Number of train images: 2568
OTF Training: Yes
Pretrained_Model_G: None
Description:
2x shallow esrgan version of the HFA2kCompact model.
This model should be usable with FAST_Anime_VSR using TensorRT for fast inference, as should my 2xHFA2kReal-CUGAN model.
All my self-trained sisr Models
I provide a python script that concurrently downloads all my (latest) released models into a specified folder, skipping already existing model files in that folder.
The script updates itself on execution, so the user is guaranteed to receive all my latest models.
The user can specify as an input which types of models he wants to download/sync.
Currently the script will download my models as either pth, safetensors or fp32 onnx files, depending on the users choice. Pth files is the recommended option since it will always be the most complete (contain all my models), since that is the output file generated by the training software. Conversions are only partially available.
If there is a demand, i might be able to additionally provide fp16 onnx, fp32 ncnn and fp16 ncnn options in this script.
Previously this release contained (compressed) archive files, but this was superseded by this python script. Not only does the script guarantee the user to receive all my (latest) released models, it is also way simpler for me to update by adding a few lines of code in case of a new release, instead of (re)packing whole archives. It also omits a lot of redundancy (files release duplication plus packing and unpacking archives). Its just a better solution overall. This rework happened thanks to przemoc's input of the current state being suboptimal so I came up with a better solution.
An example use case for this is could be comparing all my models on the same input image. So this script can be used to download all my models into a folder, then chaiNNer can be used to iterate through this models folder, upscaling the image with all my models, and then visually inspecting all the outputs, to find the best model for that image or images of similiar style/type.
Ludvae200
Name: Ludvae200
License: CC BY 4.0
Author: Philip Hofmann
Network: LUD-VAE
Scale: 1
Release Date: 25.03.2024
Purpose: 1x realistic noise degradation model
Iterations: 190'000
H_size: 64
n_channels: 3
dataloader_batch_size: 16
H_noise_level: 8
L_noise_level: 3
Dataset: RealLR200
Number of train images: 200
OTF Training: No
Pretrained_Model_G: None
Description:
1x realistic noise degradation model, trained on the RealLR200 dataset as found released on the SeeSR github repo.
Next to the ludvae200.pth model file, I provide a ludvae200.zip file which not only contains the code but also an inference script to run this model on the dataset of your choice.
Adapt the ludvae200_inference.py script accordingly by adjusting the file paths at the beginning section, to your input folder, output folder, the folder path holding the ludvae200.pth model, and a folder path where you want the text file to be generated. I made the textfile generation the same way as I did in Kim's Dataset Destroyer, which means you will have each image file logged with each of the values used to degrade that specific image file in the resulting text file, which will append only and never overwrite.
You can also adjust the strength settings inside the inference script file to fit to your needs. If you in general want less strong noise for example, you should adjust the temperature upper limit from 0.4 to 0.2 or go even lower.
So in line 96 change "temperature_strength = uniform(0.1,0.4)" to "temperature_strength = uniform(0.1,0.2)" just to give an example.
These values are defaulted to my needs of my last dataset degradation workflow I used, but feel free to adjust these values. You can also do the same as I did, temporarily using deterministic values with multiple runs to determine the min and max values of noise generation you deem suitable for your dataset needs.
An example of what this looked like for my last dataset workflow I used my model in:
Determining min and max values. Min value here is noise 1 temperature 0.1 which leads to visibly discernible noise, while max is simply the maximum degree of noise I would want my upscaling model trained on this dataset be able to handle from an input:
Then simply three examples of what these settings will produce:
4xRealWebPhoto_v4_dat2
4xRealWebPhoto_v4_dat2
Scale: 4
Architecture: DAT
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: Compression Removal, Deblur, Denoise, JPEG, WEBP, Restoration
Subject: Photography
Input Type: Images
Date: 04.04.2024
Architecture Option: DAT-2
I/O Channels: 3(RGB)->3(RGB)
Dataset: Nomos8k
Dataset Size: 8492
OTF (on the fly augmentations): No
Pretrained Model: DAT_2_x4
Iterations: 243'000
Batch Size: 4-6
GT Size: 128-256
Description: 4x Upscaling Model for Photos from the Web. The dataset consists of only downscaled photos (to handle good quality), downscaled and compressed photos (uploaded to the web and compressed by service provider), and downscale, compressed, rescaled, recompressed photos (downloaded from the web and re-uploaded to the web).
Applied lens blur, realistic noise with my ludvae200 model, JPG and WEBP compression (40-95), and down_up, linear, cubic_mitchell, lanczos, gaussian and box downsampling algorithms. For details on the degradation process, check out the pdf with its explanations and visualizations.
This is basically a dat2 version of my previous 4xRealWebPhoto_v3_atd model, but trained with a bit stronger noise values, and also a single image per variant so drastically reduced training dataset size.
Showcase:
12 Slowpics Examples
4xRealWebPhoto_v3_atd
Name: 4xRealWebPhoto_v3_atd
License: CC BY 4.0
Author: Philip Hofmann
Network: ATD
Scale: 4
Release Date: 22.03.2024
Purpose: 4x upscaler for photos downloaded from the web
Iterations: 250'000
epoch: 10
batch_size: 6, 3
HR_size: 128, 192
Dataset: 4xRealWebPhoto_v3
Number of train images: 101'904
OTF Training: No
Pretrained_Model_G: 003_ATD_SRx4_finetune
Description:
4x real web photo upscaler, meant for upscaling photos downloaded from the web. Trained on my v3 of my 4xRealWebPhoto dataset, it should be able to handle noise, jpg and webp (re)compression, (re)scaling, and just a little bit of lens blur, while also be able to handle good quality input. Trained on the very recently released (~2 weeks ago) Adaptive-Token-Dictionary network.
My 4xRealWebPhoto dataset tried to simulate the use-case of a photo being uploaded to the web and being processed by the service provides (like on a social media platform) so compression/downscaling, then maybe being downloaded and re-uploaded by another used where it, again, were processed by the service provider. I included different variants in the dataset. The pdf with info to the v2 dataset can be found here, while i simply included whats different in the v3 png:
Training details:
AdamW optimizer with U-Net SN discriminator and BFloat16.
Degraded with otf jpg compression down to 40, re-compression down to 40, together with resizes and the blur kernels.
Losses: PixelLoss using CHC (Clipped Huber with Cosine Similarity Loss), PerceptualLoss using Huber, GANLoss, LDL using Huber, Focal Frequency, Gradient Variance with Huber, YCbCr Color Loss (bt601) and Luma Loss (CIE XYZ) on neosr with norm: true.
11 Examples:
Slowpics
4xRealWebPhoto_v2_rgt_s
I will probably start releasing each of my trained models here as an individual github release entry so model files are in releases, with a stable link (have some catching up to do)
Name: 4xRealWebPhoto_v2_rgt_s
License: CC BY 4.0
Author: Philip Hofmann
Network: RGT
Network Option: RGT-S
Scale: 4
Release Date: 10.03.2024
Purpose: 4x real web photo upscaler, meant for upscaling photos downloaded from the web
Iterations: 220'000
epoch: 5
batch_size: 16
HR_size: 128
Dataset: 4xRealWebPhoto_v2 (see details in attached pdf file)
Number of train images: 1'086'976 (or 543'488 pairs)
OTF Training: No
Pretrained_Model_G: RGT_S_x4
Description:
4x real web photo upscaler, meant for upscaling photos downloaded from the web. Trained on my v2 of my 4xRealWebPhoto dataset, it should be able to handle realistic noise, jpg and webp compression and re-compression, scaling and rescaling with multiple downscampling algos, and handle a little bit of lens blur.
Thought featuring degraded images in the examples, this model should also be able to handle good quality input.
Details about the approach/dataset I made to train this model (and therefore also what this model would be capable of handling) is in the attached pdf.
My previous tries of this dataset, meaning v0 and v1, will get a separate entry, though this version would be recommended over them.
12 Examples on Slowpics