-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[P1] Questions on differences between paper and code #95
Comments
@calpt Thanks for your questions! and thanks for integrating ReFT into your library - I haven't looked at your PR yet, but I would certainly be happy to and provide feedback if that's okay. Here are inline responses to your questions:
Re: I think your interpretation of tied and untied weights is correct! In our paper, "tied weights" means we only have one intervention per layer. For "untied weights", we have two interventions per layer: one shared among prefix tokens, and one shared among suffix tokens. Our two definitions on pg. 6 could be a little confusing, but essentially they try to illustrate the fact that, we remove the positional dependency, and to have layer-dependent interventions. This is minor but I also want to call out the intervention config we introduced in the paper itself is also pretty limited. One could also potentially try to tie weights among layers, or among a set of locations and layers, etc.. It would be great if your library could support that! We have limited bandwidth, but are also working towards more flexible intervention schemas.
Re: Yes! Our namings are outdated. DiReFT in the paper is actually the |
Thanks for your quick & detailed answers!
Got it, this makes sense. Might be useful to correct the definitions in section 4.1 in a future version of the paper to avoid confusion.
These sound like sensible additional configurations to experiment with and integrate. For now, we try to cover the the essential configurations covered in the paper to make sure we have a first version ready soon. But definitely open to extend beyond this afterwards. |
Hi pyreft team, thanks for the interesting paper and the open-source library!
We're currently working on integrating a couple of ReFT methods into our AdapterHub Adapters library (see here) and came across a few questions while studying your codebase:
1. Weight tying: In section 4.1 of the paper, the following definition for the tied and untied LoReFT variants is given:

From my intepretation of this definition, in the untied variant, a separate intervention with an independent set of parameters$\phi$ would be added for each intervened position $p$ .
However, looking at the implementation, in the untied variant, there seems to be one shared intervention for all prefixes and one shared intervention for all suffixes. E.g. here:
pyreft/examples/loreft/train.py
Lines 161 to 166 in 3a2f193
two interventions are created per layer, one for all prefixes and one for all suffixes if
share_weights
is set toFalse
.Is my understanding of the paper definition wrong?
2. DiReFT: In section 3.2 of the paper, DiReFT is defined as "(...) an ablation of LoReFT which removes the orthogonality constraint
and the difference operation, reducing training time".
Looking at the implementation in pyreft, two related interventions are defined:
DireftIntervention
removes the subtraction of the projected states, but seems to keep the orthogonality constraint:pyreft/pyreft/interventions.py
Lines 151 to 152 in 3a2f193
NodireftIntervention
removes both the subtraction and the orthogonality constraint:pyreft/pyreft/interventions.py
Lines 191 to 193 in 3a2f193
Is my understanding correct that DiReFT in the paper corresponds to NoDiReFT in the code and DiReFT in the code is different from DiReFT in the paper?
Thanks in advance for your insights!
The text was updated successfully, but these errors were encountered: