b3215 #185

Nexesenex · 2024-06-24T13:21:20Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

@ochafik

* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars. * Adding additional examples as documented in #7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program. * Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs. * Merging improved schema test methods added by @ochafik in #7797 * Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework. * Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings. * Fixing grammar indentation to be consistent throughout file.

@JohannesGaessler

…alues (#8058) Uses the values computed by @JohannesGaessler in PR #7413

#8052) * Update negative.txt * Update positive.txt * Update cvector-generator.cpp * Update cvector-generator.cpp

* cvector: fix CI + correct help message * also correct --pca-iter

* Refactor Vulkan backend to allow multiple contexts * Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs * Fix Vulkan debug build error

* test-backend-ops : increase cpy max nmse * server ci : disable thread sanitizer

* hf bitnet v1 * hf bitnet e2e v2 * finish bitnet e2e * finish f16 hf bitnet e2e * remove unsed * finish bitnet i2 e2e * move i2s to quantize v1 * move i2 to quantize * clean code * clean code 2 * fix codestyle * fix code * fix * fix code * fix merge * remove unused * change table name * fix whitespace * delete redundant * i2_s to absmax * finish i2_s/i8_s vec_dot x86 simd * i2s->q22 * fix code * remove block scale * add dequantize * fix seq * update avx2 * remove q2_2 * remove q22_grid * fix whitespace * reuse llm_build_kv * fix bo --------- Co-authored-by: root <root@wangjinheng>

* ggml : remove ggml_task_type and GGML_PERF * check abort_callback on main thread only * vulkan : remove usage of ggml_compute_params * remove LLAMA_PERF

…el variants (#5763) * gguf-py : add T5 model architecture * gguf-py : add separate tensors for encoder and decoder * gguf-py : add new model header parameters: decoder_start_token_id, attention.relative_buckets_count, tokenizer.ggml.remove_extra_whitespaces, tokenizer.ggml.precompiled_charsmap * convert-hf : add model conversion support for T5ForConditionalGeneration and T5WithLMHeadModel --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

* add parameters for embeddings --embd-normalize --embd-output-format --embd-separator description in the README.md * Update README.md fix tipo * Trailing whitespace * fix json generation, use " not ' * fix merge master * fix code formating group of parameters // embedding print usage for embedding parameters --------- Co-authored-by: Brian <[email protected]>

* support splits in convert.py * Support split by size and dry run to write estimated shards/filesizes * Move split functionality to new GGUFManager class * fix improper function signature * tentative push of convert-hf-to-gguf support * resolve merge + SplitArguments for easier parsing * Fix eager tensor memory leak and remove convert.py changes Removed a memory leak caused by unexpected reference retention to eager tensors. Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py. * refactor SplitStrategy to be a deque Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself. * fix Q8 quantization * remove unnecessary imports in gguf_manager * fix final? merge issue * fix gguf_writer placement and remove comments * oops, actually fix gguf_writer placement * reduce duplicated code from gguf_writer * further simplify GGUFManager * simplify even further and standardize with GGUFWriter * reduce diffs with master * form shards while adding tensors, SHA256 sums agree with master * re-add type hint Co-authored-by: compilade <[email protected]> * GGUFWriter compatibility fix Co-authored-by: compilade <[email protected]> * Shard dataclass and un-negative dont_add_architecture * type consistency in format_n_bytes_to_str * move kv keys to constants.py * make pathlib explicit * base-1024 bytes to base-1000 * rename GGUFManager to GGUFWriterSplit * Update gguf-py/gguf/constants.py Co-authored-by: compilade <[email protected]> * fix convert-hf-to-gguf.py permissions * fix line endings * Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <[email protected]> * convert-hf : restore executable file permission * examples/convert-legacy-llama.py: restore executable file permission * reinstate original gguf package import and fix type annotation * attempt to appease the linter * attempt 2 to appease the linter * attempt 3 to appease the linter * comma consistency * Update convert-hf-to-gguf.py Co-authored-by: compilade <[email protected]> * edit cmd line args * use simplification from #7827 * kv/ti data are still wrong * try to refactor kv data (still fails) * fix ti data messiness * tidy up * fix linting * actually make the linter happy * cleanup round 1 * remove SplitStrategy, SplitArguments * appease linter * fix typing and clean up * fix linting * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * progress bar, fix split logic * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * catch oversights * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * swap bar orders * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * compatibility fix * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <[email protected]> * Update convert-hf-to-gguf.py Co-authored-by: compilade <[email protected]> --------- Co-authored-by: Brian <[email protected]> Co-authored-by: compilade <[email protected]>

* CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr

#8090) Co-authored-by: Stanisław Szymczyk <[email protected]> Co-authored-by: Brian <[email protected]>

HanClinto and others added 18 commits June 21, 2024 23:18

Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…

5b48cd5

…alues (#8058) Uses the values computed by @JohannesGaessler in PR #7413

convert-hf : change assert to exception (#8015)

3aa184a

cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (

adf480c

#8052) * Update negative.txt * Update positive.txt * Update cvector-generator.cpp * Update cvector-generator.cpp

cvector: fix CI + correct help message (#8064)

3e58b0e

* cvector: fix CI + correct help message * also correct --pca-iter

Removing extra blank lines that were breaking Lint. (#8067)

b5a5f34

Refactor Vulkan backend to allow multiple contexts (#7961)

45c0e2e

* Refactor Vulkan backend to allow multiple contexts * Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs * Fix Vulkan debug build error

fix CI failures (#8066)

b6b9a8e

* test-backend-ops : increase cpy max nmse * server ci : disable thread sanitizer

Fix typo in llama_set_embeddings comment (#8077)

11318d9

server : fix JSON-Scheme typo (#7975)

6a2f298

ggml : remove ggml_task_type and GGML_PERF (#8017)

95f57bb

* ggml : remove ggml_task_type and GGML_PERF * check abort_callback on main thread only * vulkan : remove usage of ggml_compute_params * remove LLAMA_PERF

disable publishing the full-rocm docker image (#8083)

8cb508d

CUDA: optimize MMQ int8 tensor core performance (#8062)

9a590c8

* CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr

gguf-py : fix tensor groups for encoder-decoder models in gguf-dump.py (

d62e4aa

#8090) Co-authored-by: Stanisław Szymczyk <[email protected]> Co-authored-by: Brian <[email protected]>

github-actions bot added Nvidia GPU testing examples python server ggml devops Vulkan build labels Jun 24, 2024

Nexesenex merged commit bbeb5bd into Nexesenex:skystream Jun 24, 2024
39 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3215 #185

b3215 #185

Nexesenex commented Jun 24, 2024

b3215 #185

b3215 #185

Conversation

Nexesenex commented Jun 24, 2024