Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training #340

Open
XyFighting opened this issue Dec 12, 2024 · 6 comments
Open

Training #340

XyFighting opened this issue Dec 12, 2024 · 6 comments

Comments

@XyFighting
Copy link

Hi! Thank you for your excellent work!
When I run the code on the local device, the code runs fine, but when I run it on the server, it's very slow, the problem is shown as follows. Could you suggest the way to solve it?
issue

@MzeroMiko
Copy link
Owner

Did you find the warning 'Triton not installed, fall back to pytorch implements.' or 'Can not import selective_scan_cuda_oflex. This affects speed.' in the log file?

@XyFighting
Copy link
Author

Thank you for your reply. I find the warning 'Can not import selective_scan_cuda_oflex', but when I run the code on my local device, the speed remains normal with the above warning. Could you give further suggestions?

@MzeroMiko
Copy link
Owner

In the code, we would try import oflex (kernels/selective_scan) fisrt, if not found, selective_scan_cuda_core (but actually, this is deprecated), if not found again, selective_scan_cuda (standard mamba implementation). If all above are not found, then we use torch version to make it at least usable.

So I guess that in your environment, all the accelerated patches are not installed correctly.

image

@XyFighting
Copy link
Author

Thank you for your reply, how to install selective_scan_cuda_oflex?

@MzeroMiko
Copy link
Owner

you can cd into kernels/selective_scan and pip install .

@XyFighting
Copy link
Author

Thank you for your reply, I have solved above issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants