You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As claimed in Section 3.2, one of motivations is to reduce the interference between small and large scales training dependencies. However, in your practical implementation, both drafter and refiner are trained over all scales, especially there is no special design for drafter training. Can you explain on this?
The text was updated successfully, but these errors were encountered:
For the drafter, we finetune the original pre-trained VAR-30 only on the small scales (for example, 1-7 scales).
For the refiner, we conduct a two-stage fine-tuning on the pre-trained VAR-16. First, we do knowledge distillation over all scales. Then, we do knowledge distillation only on the large scales (for example, 8-10 scales).
As claimed in Section 3.2, one of motivations is to reduce the interference between small and large scales training dependencies. However, in your practical implementation, both drafter and refiner are trained over all scales, especially there is no special design for drafter training. Can you explain on this?
The text was updated successfully, but these errors were encountered: