-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update LLMModelFactory.swift #183
base: main
Are you sure you want to change the base?
Conversation
Adds in DeepSeek (mlx-community/DeepSeek-R1-Distill-Qwen-7B-4bit) ModelConfiguration
It's not apparent from the model's output here because of a silent error, but it's actually not using the chat template, which results in worse output. For that we need @pcuenca to create a new version tag for swift-transformers, ideally after merging my PR for tool use as well as his preferred formatting solution. This would allow mlx-swift-examples to use the latest version of Jinja and swift-transformers, which will enable function calling, chat templates for vision models, as well as support for some recent models (Phi-4 and DeepSeek R1). |
Thanks for the PR. Let's wait until Swift Transformers updates Jinja and tags a new release so we can update that here. I'm guessing @pcuenca will get to it soon :). In the meantime, anyone who wants to try the model can clone this PR and be sure to manually update the Jinja package otherwise you will be using the model without a chat template and it will give pretty bad results. |
Thanks for referencing. Happy to leave it here for reference or continuation. Super interesting to read about vision and possibilities of seeing a mix examples here . I was on lookout for vision based sample code. hoping to eventually plug it to shortcuts. Regards,HasnatOn Jan 23, 2025, at 2:29 PM, Awni Hannun ***@***.***> wrote:
Thanks for the PR. Let's wait until Swift Transformers updates Jinja and tags a new release so we can update that here. I'm guessing @pcuenca will get to it soon :).
In the meantime, anyone who wants to try the model can clone this PR and be sure to manually update the Jinja package otherwise you will be using the model without a chat template and it will give pretty bad results.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sorry for the delay. There's still a problem when applying the chat template in |
I just pushed swift-transformers 0.1.15, with the new Jinja engine, tokenization fixes that impacted the Deepseek tokenizer, fixes for Phi 4, and more. |
Thanks @pcuenca that's awesome!! |
OK, so I think we just need to get the branch/tag pointers updated here, both in the xcodeproj and the Project.swift |
Adds in DeepSeek (mlx-community/DeepSeek-R1-Distill-Qwen-7B-4bit) ModelConfiguration
Am unsure what needs to be done for
ModelTypeRegistry.creators
but it worked for me.