b3615 #306

Nexesenex · 2024-08-22T07:57:29Z

No description provided.

Add more checks which prevent RPC server from crashing if invalid input is received from client

Co-authored-by: xuedinge233 <[email protected]> Co-authored-by: hipudding <[email protected]>

Co-authored-by: Stanisław Szymczyk <[email protected]>

* sycl: fix im2col overflow and sync with cuda Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert overflow Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert and dequantize Signed-off-by: zhentaoyu <[email protected]> * sycl: fix ib in dmmv Signed-off-by: zhentaoyu <[email protected]> * sycl:refine convert Signed-off-by: zhentaoyu <[email protected]> * sycl: move downsample global_range into common Signed-off-by: zhentaoyu <[email protected]> * test: add im2col and convert test cases Signed-off-by: zhentaoyu <[email protected]> * test: make new cases only in sycl Signed-off-by: zhentaoyu <[email protected]> * test: comment new test_cases for only local testing Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>

* fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <[email protected]> --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]>

…LAVA CLIP model. (#8984) * llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. - The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available. - A GGML_OP_ACC shader has been added. - The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU. Signed-off-by: Changyeon Kim <[email protected]> * fix-up coding style. Signed-off-by: Changyeon Kim <[email protected]> * Fix-up the missing initial parameter to resolve the compilation warning. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Add missing parameters. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Use nb1 and nb2 for dst. Signed-off-by: Changyeon Kim <[email protected]> * Fix check results ggml_acc call --------- Signed-off-by: Changyeon Kim <[email protected]> Co-authored-by: 0cc4m <[email protected]>

* llama : std::move llm_bigram_bpe from work_queue This commit updates the retrieval of llm_bigram_bpe objects from work_queue.top() by using std::move. The motivation for this is to avoid the copying of the std::string `text` member of the llm_bigram_bpe struct. * squash! llama : std::move llm_bigram_bpe from work_queue Introduced a MovablePriorityQueue class to allow moving elements out of the priority queue for llm_bigram_bpe. * squash! llama : std::move llm_bigram_bpe from work_queue Rename MovablePriorityQueue to lama_priority_queue. * squash! llama : std::move llm_bigram_bpe from work_queue Rename lama_priority_queue -> llama_priority_queue.

…ialization 908) Co-authored-by: Stanisław Szymczyk <[email protected]>

* feat: initial support for llama.cpp * fix: lint * refactor: better refactor * Update src/llama.cpp Co-authored-by: compilade <[email protected]> * Update src/llama.cpp Co-authored-by: compilade <[email protected]> * fix: address comments * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * fix: add more cleanup and harmonization * fix: lint * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * fix: change name * Apply suggestions from code review Co-authored-by: compilade <[email protected]> * add in operator * fix: add `dt_b_c_rms` in `llm_load_print_meta` * fix: correct printf format for bool * fix: correct print format * Update src/llama.cpp Co-authored-by: compilade <[email protected]> * llama : quantize more Mamba tensors * llama : use f16 as the fallback of fallback quant types --------- Co-authored-by: compilade <[email protected]>

* server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var

* llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <[email protected]> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <[email protected]>

* add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc

rgerganov and others added 13 commits August 19, 2024 10:10

rpc : prevent crashes on invalid input (#9040)

18eaf29

Add more checks which prevent RPC server from crashing if invalid input is received from client

rpc : print error message when failed to connect endpoint (#9042)

1b6ff90

cann: add doc for cann backend (#8867)

cfac111

Co-authored-by: xuedinge233 <[email protected]> Co-authored-by: hipudding <[email protected]>

tests : add missing comma in grammar integration tests (#9099)

90db814

Co-authored-by: Stanisław Szymczyk <[email protected]>

[SYCL] fallback mmvq (#9088)

50addec

* fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <[email protected]> --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]>

llava : zero-initialize clip_ctx structure fields with aggregate init…

f63f603

…ialization 908) Co-authored-by: Stanisław Szymczyk <[email protected]>

server : support reading arguments from environment variables (#9105)

fc54ef0

* server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var

[SYCL] Add oneDNN primitive support (#9091)

1731d42

* add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc

Nexesenex merged commit 76ac396 into Nexesenex:spacestream Aug 22, 2024
20 checks passed

github-actions bot added documentation Improvements or additions to documentation testing examples python server ggml devops SYCL Vulkan build labels Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3615 #306

b3615 #306

Nexesenex commented Aug 22, 2024

b3615 #306

b3615 #306

Conversation

Nexesenex commented Aug 22, 2024