Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama adds this extra token when the first character is '\n', and this compromises the stopping criteria, so we just remove it #2606

Closed
1 task done
ArEnSc opened this issue Jun 10, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@ArEnSc
Copy link

ArEnSc commented Jun 10, 2023

Describe the bug

Was looking at the code auditing it and I came across this.
But this does nothing since 29871 is the letter \ and 13 is n and character and it adds a '\n' to the string after.
So it makes me wonder if it works fine without this code anyways?
the token it adds is 1 or btw.

Llama adds this extra token when the first character is '\n', and this

compromises the stopping criteria, so we just remove it

if type(shared.tokenizer) is transformers.LlamaTokenizer and input_ids[0][0] == 29871:
input_ids = input_ids[:, 1:]

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

The code should be
if type(shared.tokenizer) is transformers.LlamaTokenizer and input_ids[0][1] == 29871:
input_ids = input_ids[:, 1:]

to remove the character.

Screenshot

No response

Logs

none

System Info

none
@ArEnSc ArEnSc added the bug Something isn't working label Jun 10, 2023
@ArEnSc
Copy link
Author

ArEnSc commented Jun 10, 2023

but wouldn't we want that token to start or be a terminating condition ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant