[server] phi-3 uses <|endoftext|> instead of <|end|> when applying chat template in /chat/completions #7432

andysalerno · 2024-05-21T06:53:47Z

When using phi-3 without the option --chat-template phi3, the tokenization is incorrect.

For example, if I do use --chat-template phi3, here is the log output when I send the message "hi":

{
    "level": "VERB",
    "function": "update_slots",
    "line": 1954,
    "msg": "prompt tokenized",
    "id_slot": 0,
    "id_task": 1,
    "n_ctx": 8192,
    "n_keep": 0,
    "n_prompt_tokens": 7,
    "prompt_tokens": "<s><|system|><|end|><|user|> hi<|end|><|assistant|>"
}

actually the extra space after <|user|> is concerning, it should be a newline, but maybe that's just an artifact of how the log message is formatted.

But here's what happens when the --chat-template phi3 is omitted:

{
    "level": "VERB",
    "function": "update_slots",
    "line": 1954,
    "msg": "prompt tokenized",
    "id_slot": 0,
    "id_task": 0,
    "n_ctx": 8192,
    "n_keep": 0,
    "n_prompt_tokens": 11,
    "prompt_tokens": "<s><|system|><|endoftext|> \n<|user|> hi<|endoftext|> \n<|assistant|>"
}

See how it uses <|endoftext|> (wrong) instead of <|end|> (correct) which causes really bad generation.

I am using the gguf straight from Microsoft, so I guess it is as official as it gets:

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf

Possibly the problem is in the gguf itself? Even so, it's weird that using the "official" gguf results in incorrect tokenization output from the template applied.

Now, you could just always use --chat-template phi3. But my expectation is the phi3 chat template should automatically be picked up by the detection heuristic, when using the canonical/official Phi-3 models, since they purport to support phi3.

The text was updated successfully, but these errors were encountered:

tristandruyen · 2024-05-22T00:59:56Z

So regarding your note about the space, the code actually uses a newline after <|user|> as you can see here, so it seems like a display artifact.

The template auto-detection seems broken, it mistakenly selects the zephyr chat template due to the matching <|user|>:

    # ....
    } else if (tmpl == "zephyr" || tmpl.find("<|user|>") != std::string::npos) {
        // zephyr template
        for (auto message : chat) {
            ss << "<|" << message->role << "|>" << "\n" << message->content << "<|endoftext|>\n";
        }
        if (add_ass) {
            ss << "<|assistant|>\n";
        }
    }

and this check happens before the phi template check

    # ....
    } else if (tmpl == "phi3" || (tmpl.find("<|assistant|>") != std::string::npos && tmpl.find("<|end|>") != std::string::npos )) {
        // Phi 3
        for (auto message : chat) {
            std::string role(message->role);
            ss << "<|" << role << "|>\n" << trim(message->content) << "<|end|>\n";
        }
        if (add_ass) {
            ss << "<|assistant|>\n";
        }
    }

This should be a pretty easy fix though, I'll make a PR.

andysalerno added the bug-unconfirmed label May 21, 2024

tristandruyen mentioned this issue May 22, 2024

Fix phi3 chat template confusion with zephyr #7449

Merged

ThatcherC mentioned this issue May 22, 2024

Phi 3 medium/small support #7439

Closed

ngxson closed this as completed in #7449 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[server] phi-3 uses <|endoftext|> instead of <|end|> when applying chat template in /chat/completions #7432

[server] phi-3 uses <|endoftext|> instead of <|end|> when applying chat template in /chat/completions #7432

andysalerno commented May 21, 2024 •

edited

Loading

tristandruyen commented May 22, 2024 •

edited

Loading

[server] phi-3 uses <|endoftext|> instead of <|end|> when applying chat template in /chat/completions #7432

[server] phi-3 uses <|endoftext|> instead of <|end|> when applying chat template in /chat/completions #7432

Comments

andysalerno commented May 21, 2024 • edited Loading

tristandruyen commented May 22, 2024 • edited Loading

andysalerno commented May 21, 2024 •

edited

Loading

tristandruyen commented May 22, 2024 •

edited

Loading