Skip to content

Commit

Permalink
Upgrade Transformers to v4.43.x (adapter-hub#727)
Browse files Browse the repository at this point in the history
Changes required for sync:
- re-copy Llama & Beit attention
- add clip sdp & flash attn
- fix tie_weights method
- upgrade torch version in tests

---------

Co-authored-by: Leon Engländer <[email protected]>
  • Loading branch information
dainis-boumber and lenglaender committed Aug 30, 2024
1 parent 29916f8 commit 7a247a1
Show file tree
Hide file tree
Showing 9 changed files with 215 additions and 32 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/tests_torch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
key: ${{ runner.os }}-pip-${{ hashFiles('setup.py') }}
- name: Install
run: |
pip install torch==2.1.2
pip install torch==2.3
pip install .[quality]
- name: Check Quality and Repo Consistency
run: |
Expand All @@ -62,7 +62,7 @@ jobs:
${{ runner.os }}-pip-
- name: Install
run: |
pip install torch==2.1.2
pip install torch==2.3
pip install .[sklearn,testing,sentencepiece]
- name: Test
run: |
Expand All @@ -85,7 +85,7 @@ jobs:
${{ runner.os }}-pip-
- name: Install
run: |
pip install torch==2.1.2
pip install torch==2.3
pip install .[sklearn,testing,sentencepiece]
- name: Test
run: |
Expand All @@ -108,7 +108,7 @@ jobs:
${{ runner.os }}-pip-
- name: Install
run: |
pip install torch==2.1.2
pip install torch==2.3
pip install .[sklearn,testing,sentencepiece]
pip install conllu seqeval
- name: Test Examples
Expand Down
14 changes: 10 additions & 4 deletions docs/huggingface_hub.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Alternatively, all adapters on the Hugging Face Model Hub are also listed on [ht

After you have found an adapter you would like to use, loading it into a Transformer model is easy.
For example, for loading and activating the adapter [`AdapterHub/roberta-base-pf-sick`](https://huggingface.co/AdapterHub/roberta-base-pf-sick), write:

```python
from adapters import AutoAdapterModel

Expand All @@ -34,20 +35,23 @@ For more options and information, e.g. for managing models via the CLI and Git,

1. **Prepare access credentials**: Before being able to push to the Hugging Face Model Hub for the first time, we have to store our access token in the cache.
This can be done via the `huggingface-cli` by running:
```

```sh
huggingface-cli login
```

2. **Push an adapter**: Next, we can proceed to upload our first adapter.
Let's say we have a standard pre-trained Transformers model with an existing adapter named `awesome_adapter` (e.g. added via `model.add_adapter("awesome_adapter")` and [trained](training.md) afterwards).
We can now push this adapter to the Model Hub using `model.push_adapter_to_hub()` like this:
```python
model.push_adapter_to_hub(
"my-awesome-adapter",
"awesome_adapter",
datasets_tag="imdb"
)
```
This will create a repository `my-awesome-adapter` under your username, generate a default adapter card as `README.md` and upload the adapter named `awesome_adapter` together with the adapter card to the new repository.
`datasets_tag` provides additional information for categorization.
Expand All @@ -56,12 +60,14 @@ For more options and information, e.g. for managing models via the CLI and Git,
All adapters uploaded to Hugging Face's Model Hub are automatically also listed on AdapterHub.ml. Thus, for better categorization, ``datasets_tag`` is helpful when uploading a new adapter to the Model Hub. ``datasets_tag`` specifies the dataset the adapter was trained on as an identifier from `Hugging Face Datasets <https://huggingface.co/datasets>`_.
```

Voilà! Your first adapter is on the Hugging Face Model Hub.
Voilà! Your first adapter is on
the Hugging Face Model Hub.
Anyone can now run:
```

```python
model.load_adapter("<your_username>/my-awesome-adapter")
```

To update your adapter, simply run `push_adapter_to_hub()` with the same repository name again. This will push a new commit to the existing repository.
You can find the full documentation of `push_adapter_to_hub()` [here](adapters.hub_mixin.PushAdapterToHubMixin.push_adapter_to_hub).
You can find the full documentation of `push_adapter_to_hub()` [here](adapters.hub_mixin.PushAdapterToHubMixin.push_adapter_to_hub).
4 changes: 3 additions & 1 deletion docs/loading.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,13 @@ adapter_name = model.load_adapter('sst-2')

In the minimal case, that's everything we need to specify to load a pre-trained task adapter for sentiment analysis, trained on the `sst-2` dataset using BERT base and a suitable adapter configuration.
The name of the adapter is returned by [`load_adapter()`](adapters.ModelWithHeadsAdaptersMixin.load_adapter), so we can [activate it](adapter_composition.md) in the next step:

```python
model.set_active_adapters(adapter_name)
```

As the second example, let's have a look at how to load an adapter based on the [`AdapterInfo`](adapters.utils.AdapterInfo) returned by the [`list_adapters()`](adapters.utils.list_adapters) method from [above](#finding-pre-trained-adapters):

```python
from adapters import AutoAdapterModel, list_adapters

Expand Down Expand Up @@ -93,4 +95,4 @@ We will go through the different arguments and their meaning one by one:
- By default, the `load_adapter()` method will add the loaded adapter using the identifier string given as the first argument.
To load the adapter using a custom name, we can use the `load_as` parameter.

- Finally, `set_active` will directly activate the loaded adapter for usage in each model forward pass. Otherwise, you have to manually activate the adapter via `set_active_adapters()`.
- Finally, `set_active` will directly activate the loaded adapter for usage in each model forward pass. Otherwise, you have to manually activate the adapter via `set_active_adapters()`.
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,8 @@
"sphinx-intl==2.1.0",
"sphinx-multiversion==0.2.4",
"timeout-decorator",
"torch>=1.10,!=1.12.0",
"transformers~=4.42.4",
"torch",
"transformers~=4.43.3",
]


Expand Down
16 changes: 10 additions & 6 deletions src/adapters/heads/model_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ class ModelWithFlexibleHeadsAdaptersMixin(ModelWithHeadsAdaptersMixin):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._convert_to_flex_head = True
if not hasattr(self.config, "custom_heads"):
self.config.custom_heads = {}
if not hasattr(self, "custom_heads"):
self.custom_heads = {}
self._active_heads = []

def head_type(head_type_str: str):
Expand Down Expand Up @@ -88,6 +88,8 @@ def _init_head_modules(self):
for head_name, config in self.config.prediction_heads.items():
self.add_prediction_head_from_config(head_name, config)

self._add_tied_weights_keys()

# The following methods are required for handling LM heads

def get_output_embeddings(self) -> Union[nn.Module, List[nn.Module]]:
Expand Down Expand Up @@ -132,6 +134,8 @@ def tie_weights(self):
self = getattr(self, self.base_model_prefix)
self._tie_encoder_decoder_weights(self.encoder, self.decoder, self.base_model_prefix)

super().tie_weights()

def _resize_token_embeddings(self, new_num_tokens, pad_to_multiple_of=None):
old_embeddings = self.get_input_embeddings()
new_embeddings = self._get_resized_embeddings(old_embeddings, new_num_tokens, pad_to_multiple_of)
Expand Down Expand Up @@ -174,7 +178,7 @@ def add_prediction_head_from_config(
head_class = MODEL_HEAD_MAP[head_type]
head = head_class(self, head_name, **config)
self.add_prediction_head(head, overwrite_ok=overwrite_ok, set_active=set_active)
elif head_type in self.config.custom_heads:
elif head_type in self.custom_heads:
# we have to re-add the head type for custom heads
self.add_custom_head(head_type, head_name, overwrite_ok=overwrite_ok, **config)
else:
Expand All @@ -191,7 +195,7 @@ def get_prediction_heads_config(self):
return heads

def register_custom_head(self, identifier, head):
self.config.custom_heads[identifier] = head
self.custom_heads[identifier] = head

@property
def active_head(self) -> Union[str, List[str]]:
Expand Down Expand Up @@ -251,8 +255,8 @@ def set_active_adapters(
)

def add_custom_head(self, head_type, head_name, overwrite_ok=False, set_active=True, **kwargs):
if head_type in self.config.custom_heads:
head = self.config.custom_heads[head_type](self, head_name, **kwargs)
if head_type in self.custom_heads:
head = self.custom_heads[head_type](self, head_name, **kwargs)
# When a build-in head is added as a custom head it does not have the head_type property
if not hasattr(head.config, "head_type"):
head.config["head_type"] = head_type
Expand Down
15 changes: 10 additions & 5 deletions src/adapters/models/beit/modeling_beit.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" PyTorch BEiT model."""
"""PyTorch BEiT model."""


import math
Expand All @@ -35,6 +35,7 @@ def forward(
output_attentions: bool = False,
relative_position_bias: Optional["BeitRelativePositionBias"] = None,
interpolate_pos_encoding: bool = False,
resolution: Optional[Tuple[int]] = None,
) -> Union[Tuple[torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]:
mixed_query_layer = self.query(hidden_states)

Expand All @@ -51,9 +52,11 @@ def forward(

# Add relative position bias if present.
if self.relative_position_bias is not None:
height, width = resolution
window_size = (height // self.config.patch_size, width // self.config.patch_size)
attention_scores = attention_scores + self.relative_position_bias(
interpolate_pos_encoding, attention_scores.shape[2]
).unsqueeze(0)
window_size, interpolate_pos_encoding, dim_size=hidden_states.shape[1]
)

# Add shared relative position bias if provided.
if relative_position_bias is not None:
Expand Down Expand Up @@ -89,15 +92,17 @@ def forward(
hidden_states: torch.Tensor,
head_mask: Optional[torch.Tensor] = None,
output_attentions: bool = False,
relative_position_bias: Optional[BeitRelativePositionBias] = None,
relative_position_bias: Optional["BeitRelativePositionBias"] = None,
interpolate_pos_encoding: bool = False,
resolution: Optional[Tuple[int]] = None,
) -> Union[Tuple[torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]:
self_attention_outputs = self.attention(
self.layernorm_before(hidden_states), # in BEiT, layernorm is applied before self-attention
head_mask,
output_attentions=output_attentions,
relative_position_bias=relative_position_bias,
interpolate_pos_encoding=interpolate_pos_encoding,
resolution=resolution,
)
attention_output = self_attention_outputs[0]
outputs = self_attention_outputs[1:] # add self attentions if we output attention weights
Expand Down Expand Up @@ -125,4 +130,4 @@ def forward(

outputs = (layer_output,) + outputs

return outputs
return outputs
Loading

0 comments on commit 7a247a1

Please sign in to comment.