Default to f16 model memory k/v in llm CLI and InferenceSessionConfig #296

KerfuffleV2 · 2023-06-05T10:39:44Z

Using 32bit values doesn't increase quality in a measurable way, see discussion here: ggerganov/llama.cpp#1593

The current default doubles memory consumption/size of prompt caches without a measurable upside. This pull changes the default to 16bit.

Allows --float16 for backward compatibility.

Adds --no-float16 flag to allow using 32bit memory.

Don't know if it's just me but I can't even run llm llama infer --help from the main branch:

The application panicked (crashed).
Message:  Command infer: Argument names must be unique, but 'vocabulary_path' is in use by more than one argument or group

Allows --float16 for backward compatibility. Adds --no-float16 flag to enable using f16 memory.

LLukas22 · 2023-06-06T09:00:05Z

LGTM!

But shouldn't we also change the defaults for the InferenceSessionConfig to make this the default for all libraries using llm?

Regarding the --help parameter the application only panics in debug mode, if you run it in release mode via cargo run --release -- llama infer -help it works as expected. I will create an issue for this.

KerfuffleV2 · 2023-06-06T13:42:39Z

@LLukas22 Thanks! I missed that one.

Regarding the --help parameter the application only panics in debug mode

It's really weird that compiling in release affects argument parsing. I'd never have guessed to try a different build type.

LLukas22 · 2023-06-07T07:10:55Z

Yeah clap seams to validate duplicated parameters only in debug mode. 🤷

Thanks for changing the defaults, everything looks good now. 👍

Default to f16 model memory k/v in llm CLI

5934195

Allows --float16 for backward compatibility. Adds --no-float16 flag to enable using f16 memory.

LLukas22 mentioned this pull request Jun 6, 2023

--help Parameter panics in debug mode #297

Closed

Also default to 16bit memory for InferenceSessionConfig

7106609

KerfuffleV2 changed the title ~~Default to f16 model memory k/v in llm CLI~~ Default to f16 model memory k/v in llm CLI and InferenceSessionConfig Jun 6, 2023

LLukas22 merged commit 85d6468 into rustformers:main Jun 7, 2023

KerfuffleV2 deleted the chore-default-16bit-mem branch July 7, 2023 12:23

hhamud mentioned this pull request Aug 7, 2023

Write a 0.2 changelog #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to f16 model memory k/v in llm CLI and InferenceSessionConfig #296

Default to f16 model memory k/v in llm CLI and InferenceSessionConfig #296

KerfuffleV2 commented Jun 5, 2023

LLukas22 commented Jun 6, 2023

KerfuffleV2 commented Jun 6, 2023

LLukas22 commented Jun 7, 2023

Default to f16 model memory k/v in llm CLI and InferenceSessionConfig #296

Default to f16 model memory k/v in llm CLI and InferenceSessionConfig #296

Conversation

KerfuffleV2 commented Jun 5, 2023

LLukas22 commented Jun 6, 2023

KerfuffleV2 commented Jun 6, 2023

LLukas22 commented Jun 7, 2023