Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal changes to get Mistral-7B-Instruct-v0.1 working #986

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

jeethu
Copy link
Contributor

@jeethu jeethu commented Sep 27, 2023

To build the model on macOS:

git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
# in the mlc-llm dir, rebuild mlc_chat_cli and then:
python build.py --model <path_to_model_checkout> --quantization q4f16_1 --target metal
./build/mlc_chat_cli --model Mistral-7B-Instruct-v0.1-q4f16_1

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks for your contribution!

@junrushao junrushao merged commit 6598cec into mlc-ai:main Sep 28, 2023
@masahi
Copy link
Contributor

masahi commented Sep 28, 2023

I heard that this model uses a new kind of attention (sliding window stuff). And from huggingface/transformers#26447 this model doesn't seem to share the same architecture as llama. So I wonder why using Mistral weights with llama model is supposed to work?

@jeethu jeethu deleted the jeethu/mistral branch September 28, 2023 12:40
@jeethu
Copy link
Contributor Author

jeethu commented Sep 28, 2023

So I wonder why using Mistral weights with llama model is supposed to work?

IIUC, SWA is only needed for extrapolating to sequences longer than 4096 tokens. Otherwise, the arch is the same, except that Mistral-7B uses GQA, while the Llama 2 family only uses GQA for the 70B model (Llama 2 7B and 13B use vanilla MHA). GQA support for Llama 2 models added in #567 handles that difference transparently.

@masahi
Copy link
Contributor

masahi commented Sep 28, 2023

I see, I guess that's what you meant by "Minimal changes". Indeed looking at huggingface/transformers#26447 more closely, their 900 lines modeling_mistral.py is an exact copy of modeling_llama.py except for the causal mask creation 🤦‍♂️

@jeethu
Copy link
Contributor Author

jeethu commented Sep 29, 2023

Thanks for linking to the huggingface transformers PR. If it's as simple as removing causal masking, I'll take a stab at it over the weekend.

@ZebinYang
Copy link

I can successfully build it on a Mac studio. In the last step "mlc_chat_cli --model Mistral-7B-Instruct-v0.1-q4f16_1", I got messages

Loading model...
Loading finished
Running system prompts...
System prompts finished
[INST]: 

But as I typed any prompt, the process would be terminated with following errors,

[02:29:57] /Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698048403779/work/3rdparty/tvm/src/runtime/relax_vm/pooled_allocator.h:64: Warning: PooledAllocator got InternalError during allocation: InternalError: Check failed: (buf != nil) is false: 
[02:29:57] /Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698048403779/work/3rdparty/tvm/src/runtime/relax_vm/pooled_allocator.h:65: Warning: Trying to release all unused memory and reallocate...
libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: [02:29:57] /Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698048403779/work/3rdparty/tvm/include/tvm/runtime/packed_func.h:1307: unknown type = 0
Stack trace:
  [bt] (0) 1   libtvm_runtime.dylib                0x0000000102ac595c tvm::runtime::detail::LogFatal::Entry::Finalize() + 68
  [bt] (1) 2   libtvm_runtime.dylib                0x0000000102ac5918 tvm::runtime::detail::LogFatal::Entry::Finalize() + 0
  [bt] (2) 3   libtvm_runtime.dylib                0x0000000102abfc20 __clang_call_terminate + 0
  [bt] (3) 4   libtvm_runtime.dylib                0x0000000102b8f48c tvm::runtime::relax_vm::MemoryManager::GetAllocator(DLDevice) + 640
  [bt] (4) 5   libtvm_runtime.dylib                0x0000000102b6cf34 tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::relax_vm::StorageObj>::Deleter_(tvm::runtime::Object*) + 28
  [bt] (5) 6   libtvm_runtime.dylib                0x0000000102b68f70 tvm::runtime::relax_vm::VMAllocStorage(void*, tvm::runtime::ShapeTuple, long long, DLDataType, tvm::runtime::String) + 980
  [bt] (6) 7   libtvm_runtime.dylib                0x0000000102b6dab0 void tvm::runtime::TypedPackedFunc<tvm::runtime::relax_vm::Storage (void*, tvm::runtime::ShapeTuple, long long, DLDataType, tvm::runtime::String)>::AssignTypedLambda<tvm::runtime::relax_vm::Storage (*)(void*, tvm::runtime::ShapeTuple, long long, DLDataType, tvm::runtime::String)>(tvm::runtime::relax_vm::Storage (*)(void*, tvm::runtime::ShapeTuple, long long, DLDataType, tvm::runtime::String), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>)::'lambda'(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const + 284
  [bt] (7) 8   libtvm_runtime.dylib                0x0000000102ba5614 tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 96
  [bt] (8) 9   libtvm_runtime.dylib                0x0000000102ba7584 tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction) + 1504

Any idea what is going wrong here? Thanks

@CharlieFRuan
Copy link
Contributor

Hi @ZebinYang, thanks for reporting the issue! Just tried Mistral Instruct on a Mac Studio and it worked fine. The problem you are seeing is probably due to #1087, where we updated llm_chat.cc. Therefore, you would need to either install the latest nightly, or build from source using the latest repo.
Screen Shot 2023-11-06 at 7 27 12 PM

@ZebinYang
Copy link

Hi @CharlieFRuan ,

Thanks for your response.
I am able to run the following scripts as I upgraded all the codes.

import mlc_chat
cm = mlc_chat.ChatModule(model='Mistral-7B-Instruct-v0.1-q4f16_1')
cm.generate('hi')

However, I have some further questions.

  1. As I switch the working directory to another folder (other than the mlc-llm), the script would raise an error,
1699326784781

And it does not work even I specify the full path of the compiled model.

  1. The cli command still does not work.
mlc_chat_cli --model Mistral-7B-Instruct-v0.1-q4f16_1

with error messages
1699326983002

@CharlieFRuan
Copy link
Contributor

Hi @ZebinYang, for question 1, there are two relevant things:

  • If you compiled it locally, to pass the full path, you would need to supply the params folder, which might be a bit counter-intuitive.. (but we are mainly looking for the mlc-chat-config.json, which resides in params)
    • e.g. cm = mlc_chat.ChatModule(model='/full/path/to/mlc-llm/dist/Mistral-7B-Instruct-v0.1-q4f16_1/params')
  • There is also an argument called model_lib_path which allows you to use another model library file

For question 2, did you go with updating the latest nightly, or pulling the latest repo and build from source?

@ZebinYang
Copy link

Hi @CharlieFRuan

Thanks, it worked as I added "params" in the path.
I simply updated the dependencies using

pip install --pre --force-reinstall mlc-ai-nightly mlc-chat-nightly -f https://mlc.ai/wheels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants