iOS Exception: Could not load model at ... #44

LukeMoody01 · 2024-11-26T02:35:23Z

Hi there,

First off, thanks for the hard work creating this package.

I am currently having some issues getting the package to run on iOS. I am currently loading both dynamic libraries libggml.dylib & libllama.dylib as such:

 Llama.libraryPath = "libllama.dylib";

The issue comes when trying to load a model like so:

 final model = 'ggml-vocab-gpt-2.gguf';

    final directory = await getApplicationDocumentsDirectory();
    final filePath = '${directory.path}/$model';

    final fileExists = await File(filePath).exists();
    if (!fileExists) {
      final byteData = await rootBundle.load('assets/ai/$model');
      final file = File(filePath);
      await file.writeAsBytes(byteData.buffer
          .asUint8List(byteData.offsetInBytes, byteData.lengthInBytes));
    }

    Llama llama = Llama(
      filePath,
      modelParams,
      contextParams,
      samplerParams,
    );

even doing it as a raw path:

 Llama llama = Llama(
      /Users/Luke/Workspace/llama.cpp/models/tinyllama-2-1b-miniguanaco.Q3_K_L.gguf,
      modelParams,
      contextParams,
      samplerParams,
    );

also does not work.

We are using the latest dev branch (commit hash 231a3e8).

Any help or guidance here would be greatly appreciated.

Error:

Could not load model XYZ
flutter: Error: LateInitializationError: Field 'context' has not been initialized.

The text was updated successfully, but these errors were encountered:

LukeMoody01 · 2024-11-26T02:38:32Z

Upon further debugging, here is a lovely xCode output:

llama_load_model_from_file: using device Metal (Apple A15 GPU) - 2727 MiB free
llama_model_loader: loaded meta data with 16 key-value pairs and 0 tensors from /var/mobile/Containers/Data/Application/C44C7A73-3C3E-4778-953F-B3F8412A71BF/Documents/ggml-vocab-gpt-2.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = gpt2
llama_model_loader: - kv 1: general.name str = gpt-2
llama_model_loader: - kv 2: gpt2.block_count u32 = 12
llama_model_loader: - kv 3: gpt2.context_length u32 = 1024
llama_model_loader: - kv 4: gpt2.embedding_length u32 = 768
llama_model_loader: - kv 5: gpt2.feed_forward_length u32 = 3072
llama_model_loader: - kv 6: gpt2.attention.head_count u32 = 12
llama_model_loader: - kv 7: gpt2.attention.layer_norm_epsilon f32 = 0.000010
llama_model_loader: - kv 8: general.file_type u32 = 1
llama_model_loader: - kv 9: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 10: tokenizer.ggml.pre str = gpt-2
llama_model_loader: - kv 11: tokenizer.ggml.tokens arr[str,50257] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 12: tokenizer.ggml.token_type arr[i32,50257] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 13: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
llama_model_loader: - kv 14: tokenizer.ggml.bos_token_id u32 = 50256
llama_model_loader: - kv 15: tokenizer.ggml.eos_token_id u32 = 50256
llm_load_vocab: special tokens cache size = 1
llm_load_vocab: token to piece cache size = 0.3060 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = gpt2
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 50257
llm_load_print_meta: n_merges = 50000
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 1024
llm_load_print_meta: n_embd = 768
llm_load_print_meta: n_layer = 12
llm_load_print_meta: n_head = 12
llm_load_print_meta: n_head_kv = 12
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 768
llm_load_print_meta: n_embd_v_gqa = 768
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 3072
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = -1
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 1024
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 0.1B
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 0.00 K
llm_load_print_meta: model size = 0.00 MiB (nan BPW)
llm_load_print_meta: general.name = gpt-2
llm_load_print_meta: BOS token = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token = 50256 '<|endoftext|>'
llm_load_print_meta: EOT token = 50256 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOG token = 50256 '<|endoftext|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size = 0.03 MiB
llama_model_load: error loading model: check_tensor_dims: tensor 'token_embd.weight' not found
llama_load_model_from_file: failed to load model

LukeMoody01 · 2024-11-26T03:52:47Z

After even more investigation, it seems the model I was using was malformed. Using a new model, I now get the following errors:

/llama.cpp/src/llama-sampling.cpp:279: GGML_ASSERT(cur_p.selected >= 0 && cur_p.selected < (int32_t) cur_p.size) failed

Any idea on this?

Cheers!

LukeMoody01 · 2024-11-26T05:12:52Z

Further updates, got past the above GGML_ASSERTS by modifying the model params and context params.

Hit another wall when it comes to encoding.

llama.cpp/src/llama.cpp:15342: GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed

Any ideas here?

netdur · 2024-11-26T13:40:51Z

@LukeMoody01 I could run tinyllama-2-1b-miniguanaco.Q3_K_L.gguf with script in example folder, both simple.dart and chat.dart --- could it be that your prompt is larger than context? I will try investigate
also thank you for trying dev

netdur · 2024-11-26T22:07:49Z

@LukeMoody01 please try again

LukeMoody01 · 2024-11-26T22:20:40Z

Hey @netdur,

I will give it a go using the model you just mentioned! I was using a flan T5 model.

I am also using iOS as well.

I'll get back to you soon.

LukeMoody01 · 2024-11-27T00:34:24Z

Alright, so it "works".

The AI likes to cut off its response very early, but I feel like that could be a config issue on my end. How do you usually allow the A.I to have lengthier responses? @netdur

netdur · 2024-11-27T00:38:39Z

thanks, in llama class, I have predict field fixed at low value, this set length of output, I will expose it

…

On Wed, Nov 27, 2024, 01:34 Luke Moody ***@***.***> wrote: Alright, so it "works". The AI likes to cut off its response very early, but I feel like that could be a config issue on my end. How do you usually allow the A.I to have lengthier responses? @netdur <https://github.com/netdur> — Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAODX2PCSGHMZ6KXUZ7GIU32CUHSNAVCNFSM6AAAAABSPLDD46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBSGM2DINBSGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

LukeMoody01 · 2024-11-27T00:39:50Z

That'd be awesome. Great work @netdur 😄

LukeMoody01 · 2024-11-27T00:59:03Z

Can I also ask, where do you find the models you test with? As some of the models I find on huggingface usually throw errors such as the ones above, and "GGML_ASSERT(strcmp(res->name, "result_output") == 0 && "missing result_output tensor")"

netdur · 2024-11-27T15:55:04Z

tested those models

https://huggingface.co/TheBloke/Tinyllama-2-1b-miniguanaco-GGUF/blob/main/tinyllama-2-1b-miniguanaco.Q3_K_L.gguf

https://huggingface.co/mradermacher/Qwen2-7B-Multilingual-RP-GGUF/blob/main/Qwen2-7B-Multilingual-RP.Q8_0.gguf

https://huggingface.co/MaziyarPanahi/gemma-7b-GGUF/blob/main/gemma-7b.Q8_0.gguf

netdur · 2024-11-29T23:56:52Z

@LukeMoody01 I am currently testing on ios, how do you build llama.cpp for ios?

carrascomj mentioned this issue Dec 19, 2024

[ANDROID] Failed to lookup symbol 'llama_sampler_chain_default_params' #48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iOS Exception: Could not load model at ... #44

iOS Exception: Could not load model at ... #44

LukeMoody01 commented Nov 26, 2024

LukeMoody01 commented Nov 26, 2024

LukeMoody01 commented Nov 26, 2024

LukeMoody01 commented Nov 26, 2024

netdur commented Nov 26, 2024

netdur commented Nov 26, 2024

LukeMoody01 commented Nov 26, 2024

LukeMoody01 commented Nov 27, 2024

netdur commented Nov 27, 2024 via email

LukeMoody01 commented Nov 27, 2024

LukeMoody01 commented Nov 27, 2024

netdur commented Nov 27, 2024

netdur commented Nov 29, 2024

iOS Exception: Could not load model at ... #44

iOS Exception: Could not load model at ... #44

Comments

LukeMoody01 commented Nov 26, 2024

LukeMoody01 commented Nov 26, 2024

LukeMoody01 commented Nov 26, 2024

LukeMoody01 commented Nov 26, 2024

netdur commented Nov 26, 2024

netdur commented Nov 26, 2024

LukeMoody01 commented Nov 26, 2024

LukeMoody01 commented Nov 27, 2024

netdur commented Nov 27, 2024 via email

LukeMoody01 commented Nov 27, 2024

LukeMoody01 commented Nov 27, 2024

netdur commented Nov 27, 2024

netdur commented Nov 29, 2024