You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
Thank you for your work! It was a very interesting read.
I have a question regarding Table 1 and Table 2.
According to Table 2, using only Shared-Context Distillation already leads to significant improvements in results. As I understand, this is applied without normalization.
However, since no normalization is used in this setting, does the total loss function (9) still require L_lg? I assume it would be analogous to L_sc but applied to different random patches on top of total loss. Please correct me if I’m misunderstanding.
Best regards!
The text was updated successfully, but these errors were encountered:
Hello!
Thank you for your work! It was a very interesting read.
I have a question regarding Table 1 and Table 2.
According to Table 2, using only Shared-Context Distillation already leads to significant improvements in results. As I understand, this is applied without normalization.
However, since no normalization is used in this setting, does the total loss function (9) still require L_lg? I assume it would be analogous to L_sc but applied to different random patches on top of total loss. Please correct me if I’m misunderstanding.
Best regards!
The text was updated successfully, but these errors were encountered: