-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode #333
Conversation
Aims to improve coherence and ability to resume the interactive session when the user is given input back after an end of text token is reached. Not sure what token 13 is or why it seems to help. See conversation for examples.
The model was (presumably) trained to ignore everything before the eos token. Token 13 is \n so you are replacing the end of text token with a new line, so the model will interpret is as such. |
Thanks you very much for the explanation! That makes a lot of sense now actually. I really do want this PR to make it in as I feel the current behavior is less than ideal for it to give the user control back just for the context to more or less be gone just because it reached the end of text. However it does raise the question if users wishing to avoid this behavior should just rely on the I would be interested in future proofing this PR by dynamically determining the |
To find the token id dynamically you could do something like this in main, after the call to llama_model_load and before the main loop: const auto newline_token_id = vocab.token_to_id["\n"]; This cannot be done in the same way for the eos token because it is one of many special tokens that map to an empty string in llama.cpp. The way the tokenizer is exported would need to be changed to be able to find this token dynamically unambiguously. |
Ok. The newline token is now determined dynamically and no longer a magic number or static constant. Let me know if it should be placed somewhere else. Compiled and tested it several times and seems to work the same. |
It doesn't compile as is because you have made the constant a local in |
Wait really, that's strange. I thought it compiled fine. I'll fix it. Edit: Pushed what should be the last changes. Checked and it still appears to work. |
Hold on, something isn't behaving right. I think something isn't working actually. Forgive me but investigating. Edit: I understand what's going on. I was debugging end of text with
While if I print a new line as well when the end of text token is reached, the above scenario becomes this.
Where the square represents where it gives me back control with my reverse prompt. Edit 2: I'm going to examine behavior with this change when not using a reverse prompt. Edit 3: And I will also examine behavior between llama and alpaca weights. |
this may need to be looked into further when not using a reverse prompt
There is a difference in behavior as well between the alpaca fine tunes floating around and the original llama models regarding token 13. Just an fyi. |
That's weird. I merged 330b86e and I didn't have the newline issue. Maybe it has to do with other tweaks I have? |
I'm not sure, but it seems to happen consistently to me when using a reverse prompt. It'll place the reverse prompt at the end of the line upon end of text being thrown instead of on a new line unless I manually print a new line. Behavior when not using a reverse prompt seems fine I think. Maybe only print a new line with my change when in reverse prompt mode? Something along these lines? (this is a bit embarrassing but I don't know how to cleanly put this into the end of text token block, what I tried doesn't compile)
Edit:
I think this works? Edit 2: I think this is the best I can get this. I noticed if it printed new lines when not in reverse prompt, it would sometimes add it's own new line and then my manual new line. (So basically an empty abrupt new line out of nowhere.) But in reverse prompt mode, that manual newline seems mandatory because it won't do itself when giving me back control upon end of text token. |
Here is a suggestion: notice that the token is generated in main.cpp:~1003, in this line: id = llama_sample_top_p_top_k(vocab, logits.data() + (logits.size() - n_vocab), last_n_tokens, repeat_penalty, top_k, top_p, temp, rng); At this point you could check if the generated token is eos and replace it with a newline, for example: if (id == EOS_TOKEN_ID && params.interactive) {
id = NEWLINE_TOKEN_ID;
} This should remove the need to print the new line or do anything else yourself, since it will be treated and printed automatically later on as if it came from the model. |
fix formatting of reverse prompts so they don't end up at the end of the current line while not introducing unnecessary new lines otherwise
Ah, I see. I didn't see this until after pushing another change. I'll look into this. Thank you. Edit: Correct me if I'm wrong, but wouldn't this be getting very close to what the --ignore-eos argument does? 50fae10 |
Not entirely, --ignore-eos prevents eos from being sampled at all in the first place by setting its logit (more or less odds) to zero. The model doesn't actually return one specific token, it returns the odds of all the possible tokens and then one is sampled randomly based on these odds. So any other token could be sampled instead, based on the odds returned by the model. What you want to do here is different, you want to allow eos to be sampled and then pretend that it was actually a new line. |
Hmm, I'm trying what you suggested and I cannot get it to throw an end of text token at all now, and it never gives back interactive control. It just endlessly generates very similarly to |
Ah, I think you also need to add |
Ohh, this is coming before the |
Ouch, new problem with that approach. When using reverse prompt, it does not print the reverse prompt and simply prints a new empty line with control given back to the user. However that is a problem with the other approach as well. I noticed that it seems to only print the first token of the reverse prompt and misses the ":" and only prints "User" which is it's own token. |
The other approach allows it to generate one more token before returning control to the user (since the logic happens at the end of the loop). In some cases it could be the beginning of the reverse prompt, but it would not be guaranteed. |
Well, trying to get something together. I feel the changes I made are still better than the former, but I want the reverse prompt stuff to work well with these changes. Edit: This might just have to do, I don't think there's a good way to append the reverse prompt to the output upon reaching an end of text token being replaced with a newline token. Especially considering there can be multiple reverse prompts. I don't think the user typing out their name when given back control is too terrible. Especially considering it's only when a end of text token is reached which isn't desired to begin with. Currently this is how it looks.
|
Everything but tokenizing and injecting the reverse prompt is now up to date with the API refactor. Not sure how to tokenize the reverse prompt now. Edit: Think I've figured it out, this PR should be ready shortly. |
this doesn't seem right though
This works and doesn't crash but I don't like my solution for it to work. (This is probably really simple.)
I need Edit: Actually, the inject part isn't working at all now. Doesn't seem to want to work if reverse prompt tokenize is declared earlier on. |
this doesn't seem to work if the reverse prompt is tokenized outside earlier on
I think I finally have this working and it's good to merge.
Double checked and multiple reverse prompts work fine and it injects just the first one. No crash when not using a random prompt or unintended behavior I can see. |
Co-authored-by: Georgi Gerganov <[email protected]>
I think this is finally ready to go. No issues with it that I can see and the reverse prompt injection works consistently. Appreciate all the help! |
Edit: Most of the below is now outdated. This PR aims to do two things.
-Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode
-Better reverse prompt behavior on EOS by injecting the first given reverse prompt on the newline upon EOS
Aims to improve coherence and ability to resume the interactive session when the user is given input back after an end of text token is reached. Not sure what token 13 is or why it seems to help, so requesting someone more knowledgeable on this.
Forgive the crudeness of this PR. As of 368d0c8 interactive mode now continues and gives back user input when an end of text token is reached. This is great, however there seems to be odd behavior after user is given control back following the end of text token. The following below this is my observations. (pasted mostly from another issue)
This is a bit of a tangent, but I've been looking further into the weird behavior when the end of text token occurs and gives the user control. (without the use of
--ignore-eos
, meaning end of texts can occur) and would like to propose a change to 368d0c8. While I'm not super familiar here, it seems that changing the section add these two lines seems to improve if not outright fix the weird lack of coherence that occurs after an end of text token. I am not qualified enough to speak on this technically but saw theemb.back() = 13
andlast_n_tokens.back() = 13
thrown around as a small hack to get around end of texts prior to 368d0c8.Here are three excerpts from a reverse prompt dialogue WITHOUT this addition, the current behavior when an end of text token is reached. (edited in the [end of text] parts for clarity to indicate when it gave me back control)
A bit of second hand embarrassment as it randomly started going on about the anime, Naruto and fan-fiction..
Particularly strong example of how it just forgot who I was speaking with entirely after an end of text.
And here two small excerpts WITH the above change when the end of text token is thrown.
I've tested this over the past day, and it seems pretty apparent that without
emb.back() = 13
andlast_n_tokens.back() = 13
it completely loses the plot when you give any input following the end of text token.Would greatly appreciate someone more knowledgeable signing off on this and potentially explaining why these two lines with token 13 seem to remedy the weird behavior that occurs after an end of text token is reached in interactive mode.
Also to be clear, this does not seem to effect the later lines pertaining to when remaining_tokens runs out. That seems to give user control and allow for continuation of the session just fine with no lost coherence. So just the end of text part.
Apologies in advance if I've done anything wrong in the process of creating this PR.