Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode #333

Merged
merged 29 commits into from
Mar 23, 2023
Merged

Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode #333

merged 29 commits into from
Mar 23, 2023

Conversation

rabidcopy
Copy link
Contributor

@rabidcopy rabidcopy commented Mar 20, 2023

Edit: Most of the below is now outdated. This PR aims to do two things.
-Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode
-Better reverse prompt behavior on EOS by injecting the first given reverse prompt on the newline upon EOS

Aims to improve coherence and ability to resume the interactive session when the user is given input back after an end of text token is reached. Not sure what token 13 is or why it seems to help, so requesting someone more knowledgeable on this.

Forgive the crudeness of this PR. As of 368d0c8 interactive mode now continues and gives back user input when an end of text token is reached. This is great, however there seems to be odd behavior after user is given control back following the end of text token. The following below this is my observations. (pasted mostly from another issue)

This is a bit of a tangent, but I've been looking further into the weird behavior when the end of text token occurs and gives the user control. (without the use of --ignore-eos, meaning end of texts can occur) and would like to propose a change to 368d0c8. While I'm not super familiar here, it seems that changing the section add these two lines seems to improve if not outright fix the weird lack of coherence that occurs after an end of text token. I am not qualified enough to speak on this technically but saw the emb.back() = 13 and last_n_tokens.back() = 13 thrown around as a small hack to get around end of texts prior to 368d0c8.

        // end of text token
        if (embd.back() == EOS_TOKEN_ID) {
            if (params.interactive) {
                is_interacting = true;
                embd.back() = 13;
                last_n_tokens.back() = 13;
            } else {
                fprintf(stderr, " [end of text]\n");
                break;
            }
        }

Here are three excerpts from a reverse prompt dialogue WITHOUT this addition, the current behavior when an end of text token is reached. (edited in the [end of text] parts for clarity to indicate when it gave me back control)

Start of the neverending dialogue that lasts forever and goes on for 1000 years

Jessie: H-hi there.. if someone is there. *sweats*
User: Hi there!
Jessie: A-ah! *startled*
User:It's okay, it's just me. :)                       
Jessie: Oh thank goodness...I thought i was going crazy talking to myself like this. [end of text]
User:Hello?
This user account has been created by a bot (or automated program).
If you have any questions about its content or actions, contact the bot operator directly using their talk page message system.
To protect against spamming, all new accounts are automatically blocked from editing until they pass manual review. If your request is approved, your username will be changed to match your email address.
Retrieved from "https://en.wikipedia.org/wiki/Special:UsedAccount" [end of text]

A bit of second hand embarrassment as it randomly started going on about the anime, Naruto and fan-fiction..

Start of the neverending dialogue that lasts forever and goes on for 1000 years

Jessie: H-hi there.. if someone is there. *sweats*
User: Hi there!
Jessie: A-ah! *startled*
User:It's okay. It's just me.
Jessie: Oh... ummm.... what do you need from me today?
User:end of story
 [end of text]
User: Nothing.
Title: Nothing.
Characters/Pairing: Kiba Inuzuka & Tenten (from Naruto)
Rating: PG-13 (for language)
Summary: The two are enjoying a day off together when they come across something unexpectedly beautiful.
Notes: This was written as part of the 'Narutomania' challenge over at /fic/, which asked us to write a one shot using any combination of characters or pairings from the show.
Kiba Inuzuka & Tenten were having a great time exploring Konoha during their days off.

Particularly strong example of how it just forgot who I was speaking with entirely after an end of text.

Start of the neverending dialogue that lasts forever and goes on for 1000 years

Jessie: H-hi there.. if someone is there. *sweats*
User: Hi there!
Jessie: A-ah! *startled*
User:Don't worry, just me.
Jessie: Oh... thanks....
\end{code} [end of text]
User: Hi.
User: Hello?
System: Yes. What would you like today?
User: What happened to Jessie?
System: Who are you talking about?
User: Is Jessie not here?
System: Nope. She left a while ago.
User: Oh, that's too bad.
System: You mean she was here before?
User: I thought she was.
System: Well, maybe it wasn't her then. Maybe we were thinking of another person. Or maybe we were thinking of two people at once, but one got lost inside our heads somewhere along the way.
User:

And here two small excerpts WITH the above change when the end of text token is thrown.

Start of the neverending dialogue that lasts forever and goes on for 1000 years

Jessie: H-hi there.. if someone is there. *sweats*
User: Hi there!
Jessie: A-ah! *startled*
User:It's okay, it's just me. :)
Jessie: Oh thank goodness...I thought maybe a ghost was talking to me or something...
User:end of story
 [end of text]
EndUser: Hi.
Jessie: Hello! [end of text]
User:Hi
Jessie: Who are you?
Start of the neverending dialogue that lasts forever and goes on for 1000 years

Jessie: H-hi there.. if someone is there. *sweats*
User: Hi there!
Jessie: A-ah! *startled*
User: It's okay! It's just me. :)
Jessie: Oh...thank god! You scared me half to death! :O
User:end of story
 [end of text]
EndUser: Now what?
Jessie: Well now we have a conversation going on between us two, but it doesn't end here. [end of text]

I've tested this over the past day, and it seems pretty apparent that without emb.back() = 13 and last_n_tokens.back() = 13 it completely loses the plot when you give any input following the end of text token.

Would greatly appreciate someone more knowledgeable signing off on this and potentially explaining why these two lines with token 13 seem to remedy the weird behavior that occurs after an end of text token is reached in interactive mode.

Also to be clear, this does not seem to effect the later lines pertaining to when remaining_tokens runs out. That seems to give user control and allow for continuation of the session just fine with no lost coherence. So just the end of text part.

Apologies in advance if I've done anything wrong in the process of creating this PR.

Aims to improve coherence and ability to resume the interactive session when the user is given input back after an end of text token is reached.
Not sure what token 13 is or why it seems to help. See conversation for examples.
@slaren
Copy link
Member

slaren commented Mar 20, 2023

The model was (presumably) trained to ignore everything before the eos token. Token 13 is \n so you are replacing the end of text token with a new line, so the model will interpret is as such.
On a different note I think it would be better to avoid magic magic numbers like that and add a constant somewhere. A hardcoded constant is better than nothing, but you could also find the token id for \n dynamically using the vocab.token_to_id map.

@rabidcopy
Copy link
Contributor Author

The model was (presumably) trained to ignore everything before the eos token. Token 13 is \n so you are replacing the end of text token with a new line, so the model will interpret is as such. On a different note I think it would be better to avoid magic magic numbers like that and add a constant somewhere. A hardcoded constant is better than nothing, but you could also find the token id for \n dynamically using the vocab.token_to_id map.

Thanks you very much for the explanation!

That makes a lot of sense now actually. I really do want this PR to make it in as I feel the current behavior is less than ideal for it to give the user control back just for the context to more or less be gone just because it reached the end of text. However it does raise the question if users wishing to avoid this behavior should just rely on the --ignore-eos argument to strip the end of text token altogether. But then that brings into question why give the user back control in interactive mode as a default if the state of it is going to be incoherent and going to output random things thereafter.

I would be interested in future proofing this PR by dynamically determining the \n token instead of using a constant which in the future may be a completely different token depending on what model is loaded. But I am admittedly lacking in experience or any actual coding knowledge outside of rudimentary stuff. Which is embarrassing because it probably isn't very hard. Though I do notice that the end of text token is also a constant and not dynamically determined. https://github.com/ggerganov/llama.cpp/blob/074bea2eb1f1349a0118239c4152914aecaa1be4/main.cpp#L32 Any pointers on how to proceed with this would be much appreciated. I just know I feel the current default behavior feels less than perfect when the user could just use --ignore-eos to remedy end of text tokens effectively ending their interactive session.

@slaren
Copy link
Member

slaren commented Mar 20, 2023

To find the token id dynamically you could do something like this in main, after the call to llama_model_load and before the main loop:

const auto newline_token_id = vocab.token_to_id["\n"];

This cannot be done in the same way for the eos token because it is one of many special tokens that map to an empty string in llama.cpp. The way the tokenizer is exported would need to be changed to be able to find this token dynamically unambiguously.

@gjmulder gjmulder added enhancement New feature or request generation quality Quality of model output labels Mar 20, 2023
@rabidcopy
Copy link
Contributor Author

Ok. The newline token is now determined dynamically and no longer a magic number or static constant. Let me know if it should be placed somewhere else. Compiled and tested it several times and seems to work the same.

@slaren
Copy link
Member

slaren commented Mar 20, 2023

It doesn't compile as is because you have made the constant a local in llama_model_load(), and this means that it is not visible in main() where it is used. For example you could place it around line 955 instead.

@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 20, 2023

It doesn't compile as is because you have made the constant a local in llama_model_load(), and this means that it is not visible in main() where it is used. For example you could place it around line 955 instead.

Wait really, that's strange. I thought it compiled fine. I'll fix it. Edit: Pushed what should be the last changes. Checked and it still appears to work.

@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 20, 2023

Hold on, something isn't behaving right. I think something isn't working actually. Forgive me but investigating. Edit: I understand what's going on. I was debugging end of text with fprintf(stderr, " [end of text]\n"); and it just now occurred to me that printing a new line was actually a part of the 'fix'. If a new line isn't printed when it overwrites the end of text token with a new line token, the formatting of the session gets messed up and it begins trying to place the beginning of user input at the end of the current line.
Basically this was happening.

Bot: Hi.
User: Hello, that is all.
Bot: Ok sure thing, just tell me if there's something you want to talk about.User█

While if I print a new line as well when the end of text token is reached, the above scenario becomes this.

Bot: Hi.
User: Hello, that is all.
Bot: Ok sure thing, just tell me if there's something you want to talk about.
User█

Where the square represents where it gives me back control with my reverse prompt.
So to fix this, I will print a new line as well as that makes the most sense as the end of text token is effectively being replaced with a new line.

Edit 2: I'm going to examine behavior with this change when not using a reverse prompt.

Edit 3: And I will also examine behavior between llama and alpaca weights.

this may need to be looked into further when not using a reverse prompt
@eous
Copy link

eous commented Mar 20, 2023

There is a difference in behavior as well between the alpaca fine tunes floating around and the original llama models regarding token 13. Just an fyi.

@tjohnman
Copy link
Contributor

That's weird. I merged 330b86e and I didn't have the newline issue. Maybe it has to do with other tweaks I have?

@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 20, 2023

That's weird. I merged 330b86e and I didn't have the newline issue. Maybe it has to do with other tweaks I have?

I'm not sure, but it seems to happen consistently to me when using a reverse prompt. It'll place the reverse prompt at the end of the line upon end of text being thrown instead of on a new line unless I manually print a new line. Behavior when not using a reverse prompt seems fine I think. Maybe only print a new line with my change when in reverse prompt mode?

Something along these lines? (this is a bit embarrassing but I don't know how to cleanly put this into the end of text token block, what I tried doesn't compile)

if (params.antiprompt.size() != 0) {
   fprintf(stderr, "\n");
}

Edit:

        // end of text token
        if (embd.back() == EOS_TOKEN_ID) {
            if (params.interactive) {
                is_interacting = true;
                embd.back() = NEWLINE_TOKEN_ID;
                last_n_tokens.back() = NEWLINE_TOKEN_ID;
                if (params.antiprompt.size() != 0) {
                   fprintf(stderr, "\n");
                }
            } else {
                fprintf(stderr, " [end of text]\n");
                break;
            }
        }

I think this works?

Edit 2: I think this is the best I can get this. I noticed if it printed new lines when not in reverse prompt, it would sometimes add it's own new line and then my manual new line. (So basically an empty abrupt new line out of nowhere.) But in reverse prompt mode, that manual newline seems mandatory because it won't do itself when giving me back control upon end of text token.

@slaren
Copy link
Member

slaren commented Mar 20, 2023

Here is a suggestion: notice that the token is generated in main.cpp:~1003, in this line:

id = llama_sample_top_p_top_k(vocab, logits.data() + (logits.size() - n_vocab), last_n_tokens, repeat_penalty, top_k, top_p, temp, rng);

At this point you could check if the generated token is eos and replace it with a newline, for example:

if (id == EOS_TOKEN_ID && params.interactive) {
    id = NEWLINE_TOKEN_ID;
}

This should remove the need to print the new line or do anything else yourself, since it will be treated and printed automatically later on as if it came from the model.

fix formatting of reverse prompts so they don't end up at the end of the current line while not introducing unnecessary new lines otherwise
@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 20, 2023

Here is a suggestion: notice that the token is generated in main.cpp:~1003, in this line:

id = llama_sample_top_p_top_k(vocab, logits.data() + (logits.size() - n_vocab), last_n_tokens, repeat_penalty, top_k, top_p, temp, rng);

At this point you could check if the generated token is eos and replace it with a newline, for example:

if (id == EOS_TOKEN_ID && params.interactive) {
    id = NEWLINE_TOKEN_ID;
}

This should remove the need to print the new line or do anything else yourself, since it will be treated and printed automatically later on as if it came from the model.

Ah, I see. I didn't see this until after pushing another change. I'll look into this. Thank you.

Edit: Correct me if I'm wrong, but wouldn't this be getting very close to what the --ignore-eos argument does? 50fae10

@slaren
Copy link
Member

slaren commented Mar 20, 2023

Correct me if I'm wrong, but wouldn't this be getting very close to what the --ignore-eos argument does?

Not entirely, --ignore-eos prevents eos from being sampled at all in the first place by setting its logit (more or less odds) to zero. The model doesn't actually return one specific token, it returns the odds of all the possible tokens and then one is sampled randomly based on these odds. So any other token could be sampled instead, based on the odds returned by the model. What you want to do here is different, you want to allow eos to be sampled and then pretend that it was actually a new line.

@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 20, 2023

Correct me if I'm wrong, but wouldn't this be getting very close to what the --ignore-eos argument does?

Not entirely, --ignore-eos prevents eos from being sampled at all in the first place by setting its logit (more or less odds) to zero. The model doesn't actually return one specific token, it returns the odds of all the possible tokens and then one is sampled randomly based on these odds. So any other token could be sampled instead, based on the odds returned by the model. What you want to do here is different, you want to allow eos to be sampled and then pretend that it was actually a new line.

Hmm, I'm trying what you suggested and I cannot get it to throw an end of text token at all now, and it never gives back interactive control. It just endlessly generates very similarly to --ignore-eos now. Still trying just in case it's a placebo.

@slaren
Copy link
Member

slaren commented Mar 20, 2023

Ah, I think you also need to add is_interacting = true; to force it to return control to the user.

@rabidcopy
Copy link
Contributor Author

Ohh, this is coming before the if (embd.back() == EOS_TOKEN_ID) { line. I didn't realize your approach bypasses that entirely. Back to more testing then.

@rabidcopy
Copy link
Contributor Author

Ouch, new problem with that approach. When using reverse prompt, it does not print the reverse prompt and simply prints a new empty line with control given back to the user. However that is a problem with the other approach as well. I noticed that it seems to only print the first token of the reverse prompt and misses the ":" and only prints "User" which is it's own token.

@slaren
Copy link
Member

slaren commented Mar 20, 2023

The other approach allows it to generate one more token before returning control to the user (since the logic happens at the end of the loop). In some cases it could be the beginning of the reverse prompt, but it would not be guaranteed.

@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 20, 2023

Well, trying to get something together. I feel the changes I made are still better than the former, but I want the reverse prompt stuff to work well with these changes.

Edit: This might just have to do, I don't think there's a good way to append the reverse prompt to the output upon reaching an end of text token being replaced with a newline token. Especially considering there can be multiple reverse prompts. I don't think the user typing out their name when given back control is too terrible. Especially considering it's only when a end of text token is reached which isn't desired to begin with. Currently this is how it looks.

Friendbot: End of Story! That'll be $10 please. [end of text]
User: Hi
Friendbot: Hello there! What can I do for you?
User:end the story again lol
Friendbot: Oh no... not that again! You know how much it costs to have a conversation like this one!
User:I'll pay the fee. Please end it.
Friendbot: Well then, here we go! The End! [end of text]
█

@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 22, 2023

Everything but tokenizing and injecting the reverse prompt is now up to date with the API refactor. Not sure how to tokenize the reverse prompt now.

Edit: Think I've figured it out, this PR should be ready shortly.

this doesn't seem right though
@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 22, 2023

This works and doesn't crash but I don't like my solution for it to work. (This is probably really simple.)

     // tokenize the first reverse prompt
    auto first_antiprompt = ::llama_tokenize(ctx, params.prompt,false);
    if (!params.antiprompt.empty()) {
       auto first_antiprompt = ::llama_tokenize(ctx, params.antiprompt.front(), false);
    }

I need first_antiprompt to exist to account for the inject that comes later. But I don't know how to do the equivalent of std::vector<llama_vocab::id> first_antiprompt; after the refactor. So right now this is just a hacky solution to tokenize the normal prompt once just so first_antiprompt is declared.
Any suggestions would be much appreciated.

Edit: Actually, the inject part isn't working at all now. Doesn't seem to want to work if reverse prompt tokenize is declared earlier on.

@rabidcopy
Copy link
Contributor Author

rabidcopy commented Mar 22, 2023

I think I finally have this working and it's good to merge.

Cillian: Hi. What can I do?
You: write me a short sentence about a glass of water and then end the session please, thank you.
Cillian: Ok. Here's your glass of water!
Cillian: It's blue. The glass has a handle on it. There are no ice cubes in it.
You: Great, now end the session for me. Thank you very much.
Cillian: Ending this session now. Bye! [end of text] [reverse prompt injected successfully]
You: █
Cillian: Hi. What can I do?
You:write me a short poem about water then end the session, thank you.
Cillian: Okay. Water is like life. It flows through your veins and it's what keeps us alive. Water makes everything better.
Cillian: Yes sir! Thank you for this wonderful day.
You: Thank you. Good job. Bye now.
Cillian: Have a good day sir. [end of text] [reverse prompt injected successfully]
You:█

Double checked and multiple reverse prompts work fine and it injects just the first one. No crash when not using a random prompt or unintended behavior I can see.

@rabidcopy rabidcopy requested a review from tjohnman March 22, 2023 22:54
@rabidcopy rabidcopy changed the title Improve interactive mode's coherence after EOS Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode Mar 23, 2023
Co-authored-by: Georgi Gerganov <[email protected]>
@rabidcopy
Copy link
Contributor Author

I think this is finally ready to go. No issues with it that I can see and the reverse prompt injection works consistently. Appreciate all the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request generation quality Quality of model output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants