2xBHI_small_compact_pretrain
Scale: 2x
Network type: Compact
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 2x compact high psnry model with l1&mssim loss only.
Pretrained Model: Itself
Training iterations: 50'000 (in total its 650'000 iters)
Description: 2x compact pretrain model. Goal here was simply to reach high psnry metrics on Urban100. This model has been trained with (pillow) bicubic downsampled only with the LR I provided with my bhi_small dataset (see x2, HR and Urban100 zip files in there). Only l1 and mssim losses have been used.
Process:
At first I did some tests to see what config values I want to use, these were very short tests for 10k iters only.
These were all 4x compact from scratch tests:
Out of these l1 with 0.6 and mssim with 0.4 seemed to work best.
Higher batch seems to give better metrics.
Higher patch seems to give better metrics.
adamw with learning rate 1e-4 seems to give better metrics.
Now lets try a 2x compact model with these settings:
So after 300k iters where I reached a psnry of 31.614 I thought, for the 4x models often the 2x pretrain strategy is used, were the previous 2x from scratch model is used to train the 4x version. But what about we use the 2x pretrain strategy also for the 2x pretrain itself? I simply wanted to test real quick if it would improve training
within the next 20k iterations, a new training with loading the previous model as pretrain improved faster than letting the 300k version train some more. So I continued to train this model by using itself as a pretrain, same settings still:
At 150k iterations, I wanted to see what happens when I increase patch from 64 to 128, while keeping batch 64. This now maxed out my available VRAM on my GPU (RTX 3060 with 12GB (using previous step as pretrain again):
Which I trained for 170k iters
Then I though what happens after all this training, I max out patch. Highest I could increase patch was 256 since my dataset has an HR of 512 and a corresponding x2 of 256, so thats the highest I can go. Batch needed to be reduced to 16 because of my limited ressources.
Here is all the graphs from the training runs. And even though the model improves with longer training the curves are getting flatter and flatter, meaning I think a psnry of 32 is kinda the limit I can reach with a 2x compact model on my dataset.
Config files are also attached of the runs. I attach the final high-psnry 2x compact model.
The val metrics I reached for the final state of this model (I decided to end this experiment here) (PS those are y-channel metrics, meaning psnry and ssimy)