Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Default to f16 model memory k/v in llm CLI and InferenceSessionConfig #296

Merged
merged 2 commits into from
Jun 7, 2023

Conversation

KerfuffleV2
Copy link
Contributor

Using 32bit values doesn't increase quality in a measurable way, see discussion here: ggerganov/llama.cpp#1593

The current default doubles memory consumption/size of prompt caches without a measurable upside. This pull changes the default to 16bit.

Allows --float16 for backward compatibility.

Adds --no-float16 flag to allow using 32bit memory.


Don't know if it's just me but I can't even run llm llama infer --help from the main branch:

The application panicked (crashed).
Message:  Command infer: Argument names must be unique, but 'vocabulary_path' is in use by more than one argument or group

Allows --float16 for backward compatibility.

Adds --no-float16 flag to enable using f16 memory.
@LLukas22
Copy link
Contributor

LLukas22 commented Jun 6, 2023

LGTM!

But shouldn't we also change the defaults for the InferenceSessionConfig to make this the default for all libraries using llm?


Regarding the --help parameter the application only panics in debug mode, if you run it in release mode via cargo run --release -- llama infer -help it works as expected. I will create an issue for this.

@KerfuffleV2 KerfuffleV2 changed the title Default to f16 model memory k/v in llm CLI Default to f16 model memory k/v in llm CLI and InferenceSessionConfig Jun 6, 2023
@KerfuffleV2
Copy link
Contributor Author

@LLukas22 Thanks! I missed that one.

Regarding the --help parameter the application only panics in debug mode

It's really weird that compiling in release affects argument parsing. I'd never have guessed to try a different build type.

@LLukas22
Copy link
Contributor

LLukas22 commented Jun 7, 2023

Yeah clap seams to validate duplicated parameters only in debug mode. 🤷

Thanks for changing the defaults, everything looks good now. 👍

@LLukas22 LLukas22 merged commit 85d6468 into rustformers:main Jun 7, 2023
@KerfuffleV2 KerfuffleV2 deleted the chore-default-16bit-mem branch July 7, 2023 12:23
@hhamud hhamud mentioned this pull request Aug 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants