-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support starcoder2 architecture #3089
Conversation
@esmeetu Could you check it for me? |
@sh0416 Thanks for your quick PR! Left some comments, and could you add this new model to |
Another request: Could you help supporting other |
I've been reflecting whole comments, so please review my revision. There are two key differences in this branch.
Finally, I've been test Starcoder2-7b and Starcoder2-15b and they produce correct using the given CI test, but does not include in the testcase due to the expensive computational cost of large model. Do I have to report this result? Relevant link: https://huggingface.co/bigcode/starcoder2-3b/discussions/2 |
Oh, Starcoder2-15b does not pass the test. Please wait a minute. |
Starcoder2-15b has different lm head weights (instead of tying embedding table), and now 15b passes the test. |
@sh0416 Good job! |
@esmeetu Thanks for fast response. I don't know why some actions that are irrelevant from this development failed. Could you finalize this PR? |
Head branch was pushed to by a user without write access
@sh0416 Yeah, how did you pass the test locally? It seems broken when loading config file because that repo doesn't have custom configuration file. |
My test procedure is as follow.
I will repeat this procedure in my local machine and reproduce the error you encountered, so please wait. |
FYI, it seems that huggingface server is under maintainance and seems unstable currently. |
It seems that installing transformers from source is mandatory. |
I've test it with your commit. Actually, the test requires reference generation results from HF transformers, so the error is raised when we check our testcases. However, the usage of vllm is Ok.
In my opinion, Starcoder2 would be merged in 4.39.0 as it already merged in the main branch, so this issue in the test would be resolved within near future. Thank you |
@sh0416 Thanks for your input! Generally, we aim to support new models at the earliest possible time and make them more robust later. Given the importance of the StarCoder2 model, I think the current hack is acceptable. |
#3075
I do my best to support starcoder2 architecture.
Since Huggingface transformer currently supports starcoder2 in the main development branch, it requires to install transformer from source.
My test code is as follow.
It seems that the generation result is normal, but I want to review my code for checking whether there exist bugs.
Because starcoder2 codebase comes from Mistral and GPT BigCode, I refer both files in this projects.