The attention scores are always `None` in `CachedMultiHeadAttention` #2055

apehex · 2025-01-23T13:58:08Z

Describe the bug

The variable attention_scores introduced at line 111 is always None.

To Reproduce

Since it is an internal variable, I copied the subclass CMHA in this script:
https://colab.research.google.com/drive/1ZUS4mjDQktovKiJ8TQ7zYtm4PGjesXvG?usp=sharing

Expected behavior

The variable attention_scores should contain the cross correlation between query and key, which is useful for debugging a model IMHO.

Additional context

In recent Keras versions, the parent class MultiHeadAttention saves the argument return_attention_scores in self._return_attention_scores.

Then, the method _compute_attention checks this private property to decide whether or not to return the scores.
Since this state is not updated in CachedMultiHeadAttention.call, the attention scores will never be returned.

I'll also submit an issue to Keras to turn the attribute _return_attention_scores into an argument.

Would you like to help us fix it?

Yes, I have two potential fixes:

ignore attention scores entirely, which is consistent since the corresponding argument has been removed from CMHA
add the relevant argument and set the class property _return_attention_scores accordingly

WDYT?

The text was updated successfully, but these errors were encountered:

github-actions bot assigned sachinprasadhs Jan 23, 2025

This was referenced Jan 23, 2025

Refactoring: turn the attribute _return_attention_scores into a method argument keras-team/keras#20802

Closed

Refactoring: turn the attribute _return_attention_scores into an argument keras-team/keras#20803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The attention scores are always `None` in `CachedMultiHeadAttention` #2055

The attention scores are always `None` in `CachedMultiHeadAttention` #2055

apehex commented Jan 23, 2025 •

edited

Loading

The attention scores are always None in CachedMultiHeadAttention #2055

The attention scores are always None in CachedMultiHeadAttention #2055

Comments

apehex commented Jan 23, 2025 • edited Loading

The attention scores are always `None` in `CachedMultiHeadAttention` #2055

The attention scores are always `None` in `CachedMultiHeadAttention` #2055

apehex commented Jan 23, 2025 •

edited

Loading