Why is drafter trained over all scales? #6

wjc2830 · 2025-02-06T03:54:06Z

As claimed in Section 3.2, one of motivations is to reduce the interference between small and large scales training dependencies. However, in your practical implementation, both drafter and refiner are trained over all scales, especially there is no special design for drafter training. Can you explain on this?

czg1225 · 2025-02-06T04:12:33Z

Hi @wjc2830

For the drafter, we finetune the original pre-trained VAR-30 only on the small scales (for example, 1-7 scales).

For the refiner, we conduct a two-stage fine-tuning on the pre-trained VAR-16. First, we do knowledge distillation over all scales. Then, we do knowledge distillation only on the large scales (for example, 8-10 scales).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is drafter trained over all scales? #6

Why is drafter trained over all scales? #6

wjc2830 commented Feb 6, 2025

czg1225 commented Feb 6, 2025

Why is drafter trained over all scales? #6

Why is drafter trained over all scales? #6

Comments

wjc2830 commented Feb 6, 2025

czg1225 commented Feb 6, 2025