Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom 4d attention_mask as transformers .forward() argument #27493

Closed
poedator opened this issue Nov 14, 2023 · 2 comments
Closed

custom 4d attention_mask as transformers .forward() argument #27493

poedator opened this issue Nov 14, 2023 · 2 comments

Comments

@poedator
Copy link
Contributor

poedator commented Nov 14, 2023

Feature request

somewhere inside transformers models, 2d masks are converted into 4d. I want to be able to pass my own custom 4d mask to .forward().
Presently it causes error.
CODE EXAMPLE:

model_name = "openlm-research/open_llama_3b"
model = transformers.AutoModelForCausalLM.from_pretrained(model_name, device_map=device)
# preparing KV cache
size0 = 5
max_token = 10000
x0 = torch.randint(max_token, (1, size0), device=device)
y0 = model.forward(x0, )

# forward with mask
size1 = 3
x1 = torch.randint(max_token, (1, size1), device=device)
mask_shape = (1, 1, size0, size1)  # bsz, head_dim=1, query_length, key_value_length
custom_mask = torch.randint(2, mask_shape, device=device)

model.forward(input_ids=x1, attention_mask=custom_mask, past_key_values=y0['past_key_values'])
# expected forward with this custom_mask

Error msg:

...
File .../transformers/src/transformers/modeling_attn_mask_utils.py:154, in AttentionMaskConverter._expand_mask(mask, dtype, tgt_len)
    149 @staticmethod
    150 def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
    151     """
    152     Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
    153     """
--> 154     bsz, src_len = mask.size()
    155     tgt_len = tgt_len if tgt_len is not None else src_len
    157     expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)

ValueError: too many values to unpack (expected 2)

Motivation

need custom 4d mask for experiments with causal inference.

Your contribution

I am ready to get involved, with HF guidance.
tagging @patrickvonplaten who recently authored #27086

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@poedator
Copy link
Contributor Author

UPD: this feature gets discussed and implemented in #27539

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant