[P1] Questions on differences between paper and code #95

calpt · 2024-05-31T16:22:19Z

Hi pyreft team, thanks for the interesting paper and the open-source library!

We're currently working on integrating a couple of ReFT methods into our AdapterHub Adapters library (see here) and came across a few questions while studying your codebase:

1. Weight tying: In section 4.1 of the paper, the following definition for the tied and untied LoReFT variants is given:

From my intepretation of this definition, in the untied variant, a separate intervention with an independent set of parameters $\phi$ would be added for each intervened position $p$.

However, looking at the implementation, in the untied variant, there seems to be one shared intervention for all prefixes and one shared intervention for all suffixes. E.g. here:

pyreft/examples/loreft/train.py

Lines 161 to 166 in 3a2f193

    
           # position str takes the following formats: 
        
           # f1 -> first token; f2 -> first two tokens. 
        
           # f1+l1 -> first and last tokens; f2+l2 -> first and last two tokens. 
        
           # fn or ln shares the same intervention. 
        
           if "+" in position and not share_weights: 
        
               layers += layers

two interventions are created per layer, one for all prefixes and one for all suffixes if share_weights is set to False.

Is my understanding of the paper definition wrong?

2. DiReFT: In section 3.2 of the paper, DiReFT is defined as "(...) an ablation of LoReFT which removes the orthogonality constraint
and the difference operation, reducing training time".

Looking at the implementation in pyreft, two related interventions are defined:

DireftIntervention removes the subtraction of the projected states, but seems to keep the orthogonality constraint:

pyreft/pyreft/interventions.py

Lines 151 to 152 in 3a2f193

    
           rotate_layer = LowRankRotateLayer(self.embed_dim, kwargs["low_rank_dimension"]) 
        
           self.rotate_layer = torch.nn.utils.parametrizations.orthogonal(rotate_layer)

NodireftIntervention removes both the subtraction and the orthogonality constraint:

pyreft/pyreft/interventions.py

Lines 191 to 193 in 3a2f193

    
           output = base + torch.matmul( 
        
               self.act_fn(self.learned_source(base)), self.proj_layer.weight 
        
           )

Is my understanding correct that DiReFT in the paper corresponds to NoDiReFT in the code and DiReFT in the code is different from DiReFT in the paper?

Thanks in advance for your insights!

The text was updated successfully, but these errors were encountered:

frankaging · 2024-05-31T16:46:35Z

@calpt Thanks for your questions! and thanks for integrating ReFT into your library - I haven't looked at your PR yet, but I would certainly be happy to and provide feedback if that's okay.

Here are inline responses to your questions:

Weight tying: In section 4.1 of the paper, the following definition for the tied and untied LoReFT variants ...

Re: I think your interpretation of tied and untied weights is correct! In our paper, "tied weights" means we only have one intervention per layer. For "untied weights", we have two interventions per layer: one shared among prefix tokens, and one shared among suffix tokens. Our two definitions on pg. 6 could be a little confusing, but essentially they try to illustrate the fact that, we remove the positional dependency, and to have layer-dependent interventions.

This is minor but I also want to call out the intervention config we introduced in the paper itself is also pretty limited. One could also potentially try to tie weights among layers, or among a set of locations and layers, etc.. It would be great if your library could support that! We have limited bandwidth, but are also working towards more flexible intervention schemas.

DiReFT: In section 3.2 of the paper, DiReFT is defined as "(...) an ablation of LoReFT which removes the orthogonality constraint ...

Re: Yes! Our namings are outdated. DiReFT in the paper is actually the NodireftIntervention. We will change this soon, and rename current DireftIntervention to LodireftIntervention!

calpt · 2024-06-01T11:35:39Z

Thanks for your quick & detailed answers!

I think your interpretation of tied and untied weights is correct! In our paper, "tied weights" means we only have one intervention per layer. For "untied weights", we have two interventions per layer: one shared among prefix tokens, and one shared among suffix tokens. Our two definitions on pg. 6 could be a little confusing, but essentially they try to illustrate the fact that, we remove the positional dependency, and to have layer-dependent interventions.

Got it, this makes sense. Might be useful to correct the definitions in section 4.1 in a future version of the paper to avoid confusion.

One could also potentially try to tie weights among layers, or among a set of locations and layers, etc.. It would be great if your library could support that! We have limited bandwidth, but are also working towards more flexible intervention schemas.

These sound like sensible additional configurations to experiment with and integrate. For now, we try to cover the the essential configurations covered in the paper to make sure we have a first version ready soon. But definitely open to extend beyond this afterwards.

frankaging changed the title ~~Questions on differences between paper and code~~ [P1] Questions on differences between paper and code May 31, 2024

frankaging self-assigned this May 31, 2024

frankaging added the question Further information is requested label May 31, 2024

frankaging closed this as completed Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P1] Questions on differences between paper and code #95

[P1] Questions on differences between paper and code #95

calpt commented May 31, 2024

frankaging commented May 31, 2024 •

edited

Loading

calpt commented Jun 1, 2024 •

edited

Loading

[P1] Questions on differences between paper and code #95

[P1] Questions on differences between paper and code #95

Comments

calpt commented May 31, 2024

frankaging commented May 31, 2024 • edited Loading

calpt commented Jun 1, 2024 • edited Loading

frankaging commented May 31, 2024 •

edited

Loading

calpt commented Jun 1, 2024 •

edited

Loading