Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b3615 #306

Merged
merged 13 commits into from
Aug 22, 2024
Merged

b3615 #306

merged 13 commits into from
Aug 22, 2024

Conversation

Nexesenex
Copy link
Owner

No description provided.

rgerganov and others added 13 commits August 19, 2024 10:10
Add more checks which prevent RPC server from crashing if invalid input
is received from client
Co-authored-by: xuedinge233 <[email protected]>
Co-authored-by: hipudding <[email protected]>
* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix convert overflow

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <[email protected]>

* sycl:refine convert

Signed-off-by: zhentaoyu <[email protected]>

* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <[email protected]>

* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <[email protected]>

* test: make new cases only in sycl

Signed-off-by: zhentaoyu <[email protected]>

* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <[email protected]>

---------

Signed-off-by: zhentaoyu <[email protected]>
* fallback mmvq to mul_mat

* mmvq in cuda path

* Update ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <[email protected]>

---------

Co-authored-by: Alberto Cabrera Pérez <[email protected]>
…LAVA CLIP model. (#8984)

* llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model.

- The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available.
- A GGML_OP_ACC shader has been added.
- The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU.

Signed-off-by: Changyeon Kim <[email protected]>

* fix-up coding style.

Signed-off-by: Changyeon Kim <[email protected]>

* Fix-up the missing initial parameter to resolve the compilation warning.

Signed-off-by: Changyeon Kim <[email protected]>

* [fix] Add missing parameters.

Signed-off-by: Changyeon Kim <[email protected]>

* [fix] Use nb1 and nb2 for dst.

Signed-off-by: Changyeon Kim <[email protected]>

* Fix check results ggml_acc call

---------

Signed-off-by: Changyeon Kim <[email protected]>
Co-authored-by: 0cc4m <[email protected]>
* llama : std::move llm_bigram_bpe from work_queue

This commit updates the retrieval of llm_bigram_bpe objects from
work_queue.top() by using std::move.

The motivation for this is to avoid the copying of the std::string
`text` member of the llm_bigram_bpe struct.

* squash! llama : std::move llm_bigram_bpe from work_queue

Introduced a MovablePriorityQueue class to allow moving elements
out of the priority queue for llm_bigram_bpe.

* squash! llama : std::move llm_bigram_bpe from work_queue

Rename MovablePriorityQueue to lama_priority_queue.

* squash! llama : std::move llm_bigram_bpe from work_queue

Rename lama_priority_queue -> llama_priority_queue.
* feat: initial support for llama.cpp

* fix: lint

* refactor: better refactor

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* fix: address comments

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* fix: add more cleanup and harmonization

* fix: lint

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <[email protected]>

* fix: change name

* Apply suggestions from code review

Co-authored-by: compilade <[email protected]>

* add in operator

* fix: add `dt_b_c_rms` in `llm_load_print_meta`

* fix: correct printf format for bool

* fix: correct print format

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* llama : quantize more Mamba tensors

* llama : use f16 as the fallback of fallback quant types

---------

Co-authored-by: compilade <[email protected]>
* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var
* llama : advanced batch splits

This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.

* llama : always make recurrent state slots contiguous

* ggml : simplify mamba operators

* llama : fix integer signedness mixing

* llama : logits_all has priority over batch->logits

Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.

* llama : apply suggestions

Co-authored-by: Georgi Gerganov <[email protected]>

* llama : fix t5 segfault

* llama : fix Mamba session save and restore

* llama : minor cosmetic changes

* llama : rename llama_reorder_outputs to llama_output_reorder

Also move it closer to llama_output_reserve.

* llama : fix pooled embeddings when using batches with equal_seqs

* minor : add struct members for clarity

ggml-ci

* llama : fix T5 segfault again

* llama : fix Mamba pooled embeddings with multiple sequences

Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.

* llama : add llama_model_is_recurrent to simplify figuring that out

This will make it easier to more cleanly support RWKV-v6 and Mamba-2.

* llama : fix simple splits when the batch contains embeddings

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc
@Nexesenex Nexesenex merged commit 76ac396 into Nexesenex:spacestream Aug 22, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.