Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Related to stop prompt #151

Closed
hlhr202 opened this issue Apr 23, 2023 · 5 comments
Closed

Related to stop prompt #151

hlhr202 opened this issue Apr 23, 2023 · 5 comments

Comments

@hlhr202
Copy link
Contributor

hlhr202 commented Apr 23, 2023

Hi, some models got very stupid stopping logic (eg. vicuna under instruct mode)
llama-rs can provide a way (like a stop prompt argument) to stop it by giving a text sequence.
What I v investigated are as follows:

https://github.com/Atome-FE/llama-node/blob/main/packages/llama-cpp/src/llama.rs#L152
https://github.com/sobelio/llm-chain/blob/main/llm-chain-llama/src/executor.rs#L96

I think llama.cpp also provide something called reverse prompt or anti prompt (but they just use it in interactive mode)

@LLukas22
Copy link
Contributor

I agree that adding stop word/sequence support natively would be helpful for using models in a chatbot setting, especially ones that aren't specifically trained for that purpose. Perhaps we could add a list of strings to the InferenceParameters, tokenize them, and then match the last N-generated tokens against the tokenized stop words. This should be easy enough and would prevent models from talking to themselves 😓.

@danforbes
Copy link
Contributor

danforbes commented Apr 29, 2023

Here is a naive implementation https://github.com/danforbes/llama-rs/tree/dfo/feat/chat

@philpax
Copy link
Collaborator

philpax commented May 1, 2023

@danforbes Sorry, keep forgetting to get back to you on this! Yeah that seems fine to me, want to add antiprompt: Option<&str> to inference_with_prompt?

@danforbes
Copy link
Contributor

danforbes commented May 3, 2023

You need to provide more information than just the "reverse prompt" - among other things you need to provide some kind of callback for receiving user input. There are also some changes to the way that returned tokens are handled (EOT handling as well as reverse prompt handling) that I wasn't sure how to implement cleanly in the existing inference_with_prompt function.

Edit: Also, the existing inference_with_prompt function already has too many arguments so I'm very reluctant to add even more. In this branch I actually took a stab at condensing the args to the inference function. https://github.com/danforbes/llama-rs/tree/dfo/feat/chat

@danforbes
Copy link
Contributor

@hlhr202 this should be implemented with #206, but please open another Issue if you have more suggestions 🙏🏻

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants