Releases · Phhofm/models

13 Aug 09:15

Phhofm

196f42d

Nature Dataset

This is a curated version of iNaturalist 2017 Dataset for the purpose of training single image super resolution models. The original dataset consists of 675'170 images and is 200GB in size.

There is a small version that consists of 3000 images of 512x512px that can be used to train lightweight networks like for example compact or SPAN. Average hyperiqa score of hr_small with 3000 images is: 0.767434819261233

There is also a medium version that consists of 7000 images of 512x512 that can be used to train medium or heavy networks like for example RealPLKSR or RGT/DAT/ATD. Average hyperiqa score of hr_small with 7000 images is: 0.754073106459209

HR folder, LRx2 and LRx4 folders and a validation folder provided in the Assets as zip files.

I will list the changes I applied (or simply what I did) below:

For the HR folder, I

moved all images into the same folder
removed all files that were smaller than 300kB -> 240'833 images left (from 675'170)
tiled to 512x512
hyperiqa scored all of them and removed all that were below 0.7 -> 32'499 images left, 18GB in size
checked all images for visual similarity and removed duplicates
removed a lot of human hand photos (too many human hands)
made a small version with 3k images that can be used for training lightweight sisr networks.
made a medium version with images that can be used for training medium/heavy sisr networks.
normalized filenames
oxipng -o 4 --strip safe --alpha *.png

For the LRx4 folder, I took the HR folder and applied

scaling with randomized down_up (range 0.75, 1.5), linear, cubic_mitchell, lanczos, gauss and box
slight randomized gaussian blurring
randomized jpg compression with quality 75 - 100
oxipng -o 4 --strip safe --alpha *.png

The same approach was used for the LRx2 folder

The corresponding zip files are in the Assets below. Since GitHub file size limit is 2GB, the HR_medium was split into 2 files.

Example of HR images from the dataset:

Example of bad images removed from the original iNaturalist 2017 dataset:

The small HR folder:

Assets 10

13 Aug 09:15

Phhofm

4xNature_realplksr_dysample

196f42d

4xNature_realplksr_dysample

Scale: 4
Architecture: RealPLKSR with Dysample
Architecture Option: realplksr

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Restoration
Subject: Realistic
Input Type: Images
Release Date: 13.08.2024

Dataset: Nature
Dataset Size: 7'000
OTF (on the fly augmentations): No
Pretrained Model: 4xNomos2_realplksr_dysample
Iterations: 265'000
Batch Size: 8
Patch Size: 64

Description:
A Dysample RealPLKSR 4x upscaling model for photographs nature (animals, plants).
LR prepared with down_up, linear, cubic_mitchell, lanczos, gauss and box scaling with some gaussian blur and jpg compression down to 75 (as released with my dataset, the LRx4 folder).
Trained with dysample, ea2fpn, ema, eco, adan_sf, mssim, perceptual, color, luma, dists, ldl and ff (see config toml file).
Based on my Nature Dataset which is a curated version of the iNaturalist 2017 Dataset for the purpose of training single image super resolution models.

Use the 4xNature_realplksr_dysample.pth file for inference. Also provided is a static onnx conversion with 3 256 256. Config, state, and net_d files are additionally provided for trainers, to maybe create an improved version 2 of this model or to train a similiar model from this state.

Showcase:
(Click Image to enlarge)

Assets 7

08 Aug 15:16

Phhofm

1xDeNoise_realplksr_otf

196f42d

1xDeNoise_realplksr_otf

Scale: 1
Architecture: RealPLKSR
Architecture Option: realplksr

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Restoration, Denoise
Subject: Photography
Input Type: Images
Release Date: 08.08.2024

Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): Yes
Pretrained Model: 1xDeJPG_realplksr_otf
Iterations: 200'000
Batch Size: 8
Patch Size: 64

Description:
A 1x realplksr model to denoise, trained with the realesrgan-otf pipeline, also handles a bit of jpg compression (if stronger jpg compression handling is needed, 1xDeJPG_realplksr_otf can be used).

Showcase:
(Click on image for better view)

Assets 6

08 Aug 10:19

Phhofm

1xDeH264_realplksr

196f42d

1xDeH264_realplksr

Scale: 1
Architecture: RealPLKSR
Architecture Option: realplksr

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Restoration, H264
Subject: Photography
Input Type: Images
Release Date: 08.08.2024

Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): No
Pretrained Model: 1xDeJPG_realplksr_otf
Iterations: 210'000
Batch Size: 8
Patch Size: 64

Description:
A 1x de h264 model to remove h264 compression.

Showcase:
Slowpics
Imgsli

(Click on image for better view)

Assets 7

08 Aug 08:40

Phhofm

dataset_artfaces

3c76034

ArtFaces Dataset

This is a curated version of metfaces dataset for the purpose of training single image super resolution models.
It consists of 5630 images that are 512x512px.
HR folder, LRx2 and LRx4 folders and a validation folder provided in the Assets as zip files.

I will list the changes I applied (or simply what I did) below:

For the HR folder, I

applied the multi-scale strategy but with scale 1 and 0.5
cropped to sub-images -> all images are now 512x512
checked all images for visual similarity
hyperiqa scoring; average score with 6992 images is: 0.425619977407021
removed all images that scored lower than 0.3; also fits Assets 2GB file size limit now;
hyperiqa scoring; average new score with 5630 images is: 0.4653959208652774
extracted val images from the previously removed images
normalized filenames

For the LRx4 folder, I took the HR folder and applied

scaling with down_up (range 0.75, 1.5), linear, cubic_mitchell, lanczos, gauss and box
slight randomized gaussian blurring
randomized jpg compression with quality 75 - 100

The same approach was used for the LRx2 folder

The corresponding zip files are in the Assets below

Example of HR images:

Assets 6

08 Aug 08:47

Phhofm

4xArtFaces_realplksr_dysample

3c76034

4xArtFaces_realplksr_dysample

Scale: 4
Architecture: RealPLKSR with Dysample
Architecture Option: realplksr

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Restoration
Subject: Art
Input Type: Images
Release Date: 08.08.2024

Dataset: ArtFaces
Dataset Size: 5'630
OTF (on the fly augmentations): No
Pretrained Model: 4xNomos2_realplksr_dysample
Iterations: 139'000
Batch Size: 6
Patch Size: 64

Description:
A Dysample RealPLKSR 4x upscaling model for art / painted faces.
Based on my ArtFaces Dataset which is a curated version of the metfaces dataset for the purpose of training single image super resolution models.

Showcase:
(Click Image to enlarge)

[

Assets 4

19 Jul 13:41

Phhofm

4xmssim_hma_pretrains

a579cef

4xmssim_hma_pretrains

Since no official HMA model releases exist yet, I am releasing my hma and hma_medium mssim pretrains.
These can be used to speed up and stabilize early training stages when training new hma models.
Trained with mssim on nomosv2.

4xmssim_hma_pretrain

Scale: 4
Architecture: HMA
Architecture Option: hma

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Pretrained
Subject: Photography
Input Type: Images
Release Date: 19.07.2024

Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): No
Pretrained Model: None (=From Scratch)
Iterations: 205'000
Batch Size: 4
Patch Size: 96

Description: A pretrain to start hma model training.

4xmssim_hma_medium_pretrain

Scale: 4
Architecture: HMA
Architecture Option: hma_medium

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Pretrained
Subject: Photography
Input Type: Images
Release Date: 19.07.2024

Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): No
Pretrained Model: None (=From Scratch)
Iterations: 150'000
Batch Size: 4
Patch Size: 48

Description: A pretrain to start hma_medium model training.

Showcase:

slow.pics

Assets 6

13 Jul 11:11

Phhofm

4xHFA2k_ludvae_realplksr_dysample

258a27a

4xHFA2k_ludvae_realplksr_dysample

4xHFA2k_ludvae_realplksr_dysample
Scale: 4
Architecture: RealPLKSR with Dysample
Architecture Option: realplksr

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Restoration
Subject: Anime
Input Type: Images
Release Date: 13.07.2024

Dataset: HFA2k_LUDVAE
Dataset Size: 10'272
OTF (on the fly augmentations): No
Pretrained Model: 4xNomos2_realplksr_dysample
Iterations: 165'000
Batch Size: 12
GT Size: 256

Description:
A Dysample RealPLKSR 4x upscaling model for anime single-image resolution.
The dataset has been degraded using DM600_LUDVAE, for more realistic noise/compression. Downscaling algorithms used were imagemagick box, triangle, catrom, lanczos and mitchell. Blurs applied were gaussian, box and lens blur (using chaiNNer). Some images were further compressed using -quality 75-92. Down-up was applied to roughly 10% of the dataset (5 to 15% variation in size). Degradations orders were shuffled, to give as many variations as possible.

Examples are inferenced with neosr testscript and the released pth file. I include the test images also as a zip file in this release together with the model outputs, so others can test their models against these test images aswell to compare.

onnx conversions are static since dysample doesnt allow dynamic conversion, I tested the conversions with chaiNNer.

Showcase:
Slowpics

(Click Image to enlarge)

Assets 7

09 Jul 08:48

Phhofm

1xDeJPG_realplksr_otf

9109fd3

1xDeJPG_realplksr_otf

1xDeJPG_realplksr_otf
Scale: 1
Architecture: RealPLKSR
Architecture Option: realplksr

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: JPEG, Restoration
Subject: Photography
Input Type: Images
Release Date: 09.07.2024

Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): Yes
Pretrained Model: This one used the 60 model as pretrain, while I used 4xNomos2_realplksr_dysample as a pretrain for the 60 model.
Iterations: 196'000
Batch Size: 24
GT Size: 64

Description:
A 1x dejpg model, trained with otf down to jpg 40 in both degradation processes, see the yml config file for details.
I also release the 1xDeJPG_realplksr_otf_60 model additionally, which had been trained with less jpg compression in comparison, down to 60 in both cases, where the hope was that it could give a bit higher quality on less compressed inputs.
The 40 model is released as default since it will be able to handle more compressed inputs in general.
Since these are trained with otf will slightly deblur. Examples here are compressed, and then recompressed, with jpg 40.

Showcase:
Slowpics
Imgsli

(Click on image for better view)

Assets 10

30 Jun 11:13

Phhofm

4xNomos2_realplksr_dysample

c472c82

4xNomos2_realplksr_dysample

4xNomos2_realplksr_dysample
Scale: 4
Architecture: RealPLKSR with Dysample
Architecture Option: realplksr

Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Pretrained
Subject: Photography
Input Type: Images
Release Date: 30.06.2024

Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): No
Pretrained Model: 4xmssim_realplksr_dysample_pretrain
Iterations: 185'000
Batch Size: 8
GT Size: 256, 512

Description:
A Dysample RealPLKSR 4x upscaling model that was trained with / handles jpg compression down to 70 on the Nomosv2 dataset, preserved DoF.
Based on the 4xmssim_realplksr_dysample_pretrain I released 3 days ago.
This model affects / saturate colors, which can be counteracted a bit by using wavelet color fix, as used in these examples.

Added a static (3 256 256) onnx conversion, with fp32 and fully optimized. This can be used with chaiNNer, since the dysample pth file would be unsupported. (Removed other conversions like statis with 128 because they would produce different results, but static 256 gives the same result as using the pth file with neosr testscript.

Showcase:
Slowpics

(Click on image for better view)