Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Mixtral on macOS #1558

Merged
merged 1 commit into from
Jan 8, 2024

Conversation

junrushao
Copy link
Member

@junrushao junrushao commented Jan 8, 2024

A follow-up of my previous PR (#1529).

This PR makes Mixtral work on Metal GPUs that macOS comes with. There's honestly no much change needed, except that Metal doesn't support fp64 data types.

A python script to run Mixtral:

from mlc_chat import ChatConfig, ChatModule, callback
from mlc_chat.support import logging
logging.enable_logging()

MODEL = "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"
NUM_GPU = 1

def main():
    cm = ChatModule(MODEL, chat_config=ChatConfig(
        sliding_window_size=1024,
        tensor_parallel_shards=NUM_GPU,
    ))
    cm.generate("What is the meaning of life?", progress_callback=callback.StreamToStdout(callback_interval=2))

if __name__ == "__main__":
    main()

Quantization formats:

A follow-up of my previous PR (mlc-ai#1529).

This PR makes Mixtral work on Metal GPUs that macOS comes with. There
are honestly no much change needed, except for that Metal doesn't
support fp64 data types.

A python script to run Mixtral:

```python
from mlc_chat import ChatConfig, ChatModule, callback
from mlc_chat.support import logging
logging.enable_logging()

MODEL = "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"
NUM_GPU = 1

def main():
    cm = ChatModule(MODEL, chat_config=ChatConfig(
        sliding_window_size=1024,
        tensor_parallel_shards=NUM_GPU,
    ))
    cm.generate("What is the meaning of life?", progress_callback=callback.StreamToStdout(callback_interval=2))

if __name__ == "__main__":
    main()
```

Quantization formats:
- 3-bit (19.662 GB): ["HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q3f16_1-MLC"](https://huggingface.co/junrushao/Mixtral-8x7B-Instruct-v0.1-q3f16_1-MLC)
- 4-bit (24.466 GB): ["HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"](https://huggingface.co/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC)
@junrushao junrushao marked this pull request as ready for review January 8, 2024 06:30
@jinhongyii jinhongyii merged commit 0bfb6c0 into mlc-ai:main Jan 8, 2024
@junrushao junrushao mentioned this pull request Jan 8, 2024
@junrushao
Copy link
Member Author

NOTE: this may take a few extra days until 4 outstanding PRs in TVM get merged. For those who are curious, I have a working branch of TVM if you'd love to build it from source: https://github.com/junrushao/tvm/commits/mixtral-debug/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants