-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iOS Exception: Could not load model at ... #44
Comments
Upon further debugging, here is a lovely xCode output: llama_load_model_from_file: using device Metal (Apple A15 GPU) - 2727 MiB free |
After even more investigation, it seems the model I was using was malformed. Using a new model, I now get the following errors: /llama.cpp/src/llama-sampling.cpp:279: GGML_ASSERT(cur_p.selected >= 0 && cur_p.selected < (int32_t) cur_p.size) failed Any idea on this? Cheers! |
Further updates, got past the above GGML_ASSERTS by modifying the model params and context params. Hit another wall when it comes to encoding. llama.cpp/src/llama.cpp:15342: GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed Any ideas here? |
@LukeMoody01 I could run tinyllama-2-1b-miniguanaco.Q3_K_L.gguf with script in example folder, both simple.dart and chat.dart --- could it be that your prompt is larger than context? I will try investigate |
@LukeMoody01 please try again |
Hey @netdur, I will give it a go using the model you just mentioned! I was using a flan T5 model. I am also using iOS as well. I'll get back to you soon. |
Alright, so it "works". The AI likes to cut off its response very early, but I feel like that could be a config issue on my end. How do you usually allow the A.I to have lengthier responses? @netdur |
thanks, in llama class, I have predict field fixed at low value, this set
length of output, I will expose it
…On Wed, Nov 27, 2024, 01:34 Luke Moody ***@***.***> wrote:
Alright, so it "works".
The AI likes to cut off its response very early, but I feel like that
could be a config issue on my end. How do you usually allow the A.I to have
lengthier responses? @netdur <https://github.com/netdur>
—
Reply to this email directly, view it on GitHub
<#44 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAODX2PCSGHMZ6KXUZ7GIU32CUHSNAVCNFSM6AAAAABSPLDD46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBSGM2DINBSGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That'd be awesome. Great work @netdur 😄 |
Can I also ask, where do you find the models you test with? As some of the models I find on huggingface usually throw errors such as the ones above, and "GGML_ASSERT(strcmp(res->name, "result_output") == 0 && "missing result_output tensor")" |
tested those models https://huggingface.co/MaziyarPanahi/gemma-7b-GGUF/blob/main/gemma-7b.Q8_0.gguf |
@LukeMoody01 I am currently testing on ios, how do you build llama.cpp for ios? |
Hi there,
First off, thanks for the hard work creating this package.
I am currently having some issues getting the package to run on iOS. I am currently loading both dynamic libraries
libggml.dylib
&libllama.dylib
as such:The issue comes when trying to load a model like so:
even doing it as a raw path:
also does not work.
We are using the latest dev branch (commit hash 231a3e8).
Any help or guidance here would be greatly appreciated.
Error:
Could not load model XYZ
flutter: Error: LateInitializationError: Field 'context' has not been initialized.
The text was updated successfully, but these errors were encountered: