FlashAttention only supports Ampere GPUs or newer. #1292
-
我的卡有一张是泰坦的xp,查过资料是FlashAttention的问题,transformers版本高会自动调用FlashAttention,但是降版本又和一些库冲突了,怎么办 |
Beta Was this translation helpful? Give feedback.
Answered by
zRzRzRzRzRzRzR
Jan 29, 2025
Replies: 1 comment
-
因为FlashAttention只支持RTX30以上的ampere架构,我们的模型也需要使用BF16推理,这是硬件上的限制,确实没办法。 |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
zRzRzRzRzRzRzR
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
因为FlashAttention只支持RTX30以上的ampere架构,我们的模型也需要使用BF16推理,这是硬件上的限制,确实没办法。