Update LLMModelFactory.swift #183

hasnat · 2025-01-23T07:02:35Z

Adds in DeepSeek (mlx-community/DeepSeek-R1-Distill-Qwen-7B-4bit) ModelConfiguration

Am unsure what needs to be done for ModelTypeRegistry.creators but it worked for me.

Adds in DeepSeek (mlx-community/DeepSeek-R1-Distill-Qwen-7B-4bit) ModelConfiguration

typo

DePasqualeOrg · 2025-01-23T10:58:43Z

It's not apparent from the model's output here because of a silent error, but it's actually not using the chat template, which results in worse output. For that we need @pcuenca to create a new version tag for swift-transformers, ideally after merging my PR for tool use as well as his preferred formatting solution. This would allow mlx-swift-examples to use the latest version of Jinja and swift-transformers, which will enable function calling, chat templates for vision models, as well as support for some recent models (Phi-4 and DeepSeek R1).

awni · 2025-01-23T14:29:01Z

Thanks for the PR. Let's wait until Swift Transformers updates Jinja and tags a new release so we can update that here. I'm guessing @pcuenca will get to it soon :).

In the meantime, anyone who wants to try the model can clone this PR and be sure to manually update the Jinja package otherwise you will be using the model without a chat template and it will give pretty bad results.

hasnat · 2025-01-23T18:46:26Z

Thanks for referencing. Happy to leave it here for reference or continuation. Super interesting to read about vision and possibilities of seeing a mix examples here . I was on lookout for vision based sample code. hoping to eventually plug it to shortcuts. Regards,HasnatOn Jan 23, 2025, at 2:29 PM, Awni Hannun ***@***.***> wrote: Thanks for the PR. Let's wait until Swift Transformers updates Jinja and tags a new release so we can update that here. I'm guessing @pcuenca will get to it soon :). In the meantime, anyone who wants to try the model can clone this PR and be sure to manually update the Jinja package otherwise you will be using the model without a chat template and it will give pretty bad results. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

pcuenca · 2025-01-24T16:38:18Z

Sorry for the delay. There's still a problem when applying the chat template in swift-transformers, looking into it.

pcuenca · 2025-01-24T19:28:31Z

I just pushed swift-transformers 0.1.15, with the new Jinja engine, tokenization fixes that impacted the Deepseek tokenizer, fixes for Phi 4, and more.

awni · 2025-01-24T19:31:13Z

Thanks @pcuenca that's awesome!!

davidkoski · 2025-01-24T20:10:58Z

OK, so I think we just need to get the branch/tag pointers updated here, both in the xcodeproj and the Project.swift

hasnat added 2 commits January 23, 2025 07:00

Update LLMModelFactory.swift

6e0f5f9

Adds in DeepSeek (mlx-community/DeepSeek-R1-Distill-Qwen-7B-4bit) ModelConfiguration

Update LLMModelFactory.swift

0a02743

typo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update LLMModelFactory.swift #183

Update LLMModelFactory.swift #183

hasnat commented Jan 23, 2025

DePasqualeOrg commented Jan 23, 2025

awni commented Jan 23, 2025

hasnat commented Jan 23, 2025 via email

pcuenca commented Jan 24, 2025

pcuenca commented Jan 24, 2025

awni commented Jan 24, 2025

davidkoski commented Jan 24, 2025

Update LLMModelFactory.swift #183

Are you sure you want to change the base?

Update LLMModelFactory.swift #183

Conversation

hasnat commented Jan 23, 2025

DePasqualeOrg commented Jan 23, 2025

awni commented Jan 23, 2025

hasnat commented Jan 23, 2025 via email

pcuenca commented Jan 24, 2025

pcuenca commented Jan 24, 2025

awni commented Jan 24, 2025

davidkoski commented Jan 24, 2025