Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared-Context Distillation - No Normalization Loss Function #23

Open
Etzelkut opened this issue Mar 11, 2025 · 1 comment
Open

Shared-Context Distillation - No Normalization Loss Function #23

Etzelkut opened this issue Mar 11, 2025 · 1 comment

Comments

@Etzelkut
Copy link

Hello!
Thank you for your work! It was a very interesting read.

I have a question regarding Table 1 and Table 2.
According to Table 2, using only Shared-Context Distillation already leads to significant improvements in results. As I understand, this is applied without normalization.

However, since no normalization is used in this setting, does the total loss function (9) still require L_lg​? I assume it would be analogous to L_sc​ but applied to different random patches on top of total loss. Please correct me if I’m misunderstanding.

Best regards!

@Etzelkut
Copy link
Author

I suppose this misunderstanding comes from Figure 3, where Shared-Context Distillation include L_lg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant