Llama adds this extra token when the first character is '\n', and this compromises the stopping criteria, so we just remove it #2606

ArEnSc · 2023-06-10T04:29:01Z

Describe the bug

Was looking at the code auditing it and I came across this.
But this does nothing since 29871 is the letter \ and 13 is n and character and it adds a '\n' to the string after.
So it makes me wonder if it works fine without this code anyways?
the token it adds is 1 or ~~btw.~~

Llama adds this extra token when the first character is '\n', and this

compromises the stopping criteria, so we just remove it

if type(shared.tokenizer) is transformers.LlamaTokenizer and input_ids[0][0] == 29871:
input_ids = input_ids[:, 1:]

Is there an existing issue for this?

I have searched the existing issues

Reproduction

The code should be
if type(shared.tokenizer) is transformers.LlamaTokenizer and input_ids[0][1] == 29871:
input_ids = input_ids[:, 1:]

to remove the character.

Screenshot

No response

Logs

none

System Info

none

ArEnSc · 2023-06-10T04:31:32Z

but wouldn't we want that token to start or be a terminating condition ?

ArEnSc added the bug Something isn't working label Jun 10, 2023

ArEnSc closed this as completed Jun 11, 2023

haileyschoelkopf mentioned this issue Jan 19, 2024

Deal with _encode_pair() / Llama token 29871 / SPIECE_UNDERLINE better EleutherAI/lm-evaluation-harness#1322

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama adds this extra token when the first character is '\n', and this compromises the stopping criteria, so we just remove it #2606

Llama adds this extra token when the first character is '\n', and this compromises the stopping criteria, so we just remove it #2606

ArEnSc commented Jun 10, 2023

Llama adds this extra token when the first character is '\n', and this

compromises the stopping criteria, so we just remove it

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

ArEnSc commented Jun 10, 2023

Llama adds this extra token when the first character is '\n', and this compromises the stopping criteria, so we just remove it #2606

Llama adds this extra token when the first character is '\n', and this compromises the stopping criteria, so we just remove it #2606

Comments

ArEnSc commented Jun 10, 2023

Describe the bug

Llama adds this extra token when the first character is '\n', and this

compromises the stopping criteria, so we just remove it

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

ArEnSc commented Jun 10, 2023