Support Mixtral on macOS #1558

junrushao · 2024-01-08T06:30:51Z

A follow-up of my previous PR (#1529).

This PR makes Mixtral work on Metal GPUs that macOS comes with. There's honestly no much change needed, except that Metal doesn't support fp64 data types.

A python script to run Mixtral:

from mlc_chat import ChatConfig, ChatModule, callback
from mlc_chat.support import logging
logging.enable_logging()

MODEL = "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"
NUM_GPU = 1

def main():
    cm = ChatModule(MODEL, chat_config=ChatConfig(
        sliding_window_size=1024,
        tensor_parallel_shards=NUM_GPU,
    ))
    cm.generate("What is the meaning of life?", progress_callback=callback.StreamToStdout(callback_interval=2))

if __name__ == "__main__":
    main()

Quantization formats:

3-bit (19.662 GB): "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q3f16_1-MLC"
4-bit (24.466 GB): "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"

A follow-up of my previous PR (mlc-ai#1529). This PR makes Mixtral work on Metal GPUs that macOS comes with. There are honestly no much change needed, except for that Metal doesn't support fp64 data types. A python script to run Mixtral: ```python from mlc_chat import ChatConfig, ChatModule, callback from mlc_chat.support import logging logging.enable_logging() MODEL = "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC" NUM_GPU = 1 def main(): cm = ChatModule(MODEL, chat_config=ChatConfig( sliding_window_size=1024, tensor_parallel_shards=NUM_GPU, )) cm.generate("What is the meaning of life?", progress_callback=callback.StreamToStdout(callback_interval=2)) if __name__ == "__main__": main() ``` Quantization formats: - 3-bit (19.662 GB): ["HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q3f16_1-MLC"](https://huggingface.co/junrushao/Mixtral-8x7B-Instruct-v0.1-q3f16_1-MLC) - 4-bit (24.466 GB): ["HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"](https://huggingface.co/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC)

junrushao · 2024-01-08T07:19:33Z

NOTE: this may take a few extra days until 4 outstanding PRs in TVM get merged. For those who are curious, I have a working branch of TVM if you'd love to build it from source: https://github.com/junrushao/tvm/commits/mixtral-debug/

junrushao marked this pull request as ready for review January 8, 2024 06:30

jinhongyii approved these changes Jan 8, 2024

View reviewed changes

jinhongyii merged commit 0bfb6c0 into mlc-ai:main Jan 8, 2024

junrushao mentioned this pull request Jan 8, 2024

Mixtral support #1529

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Mixtral on macOS #1558

Support Mixtral on macOS #1558

junrushao commented Jan 8, 2024 •

edited

Loading

junrushao commented Jan 8, 2024

Support Mixtral on macOS #1558

Support Mixtral on macOS #1558

Conversation

junrushao commented Jan 8, 2024 • edited Loading

junrushao commented Jan 8, 2024

junrushao commented Jan 8, 2024 •

edited

Loading