Examples/Server API change breaks prompt as JSON array. #4676

jboero · 2023-12-28T23:30:27Z

My app uses the examples/server API /completions endpoint and recently it stopped working giving "no slots available" errors even though a slot is clearly available. It seems like one slot is required per item in the prompt array. Also now instead of a simple message content, streaming results are provided as an object containing an array of results.

Was there a documented change for this? Maybe it's better to use the ChatGPT-compatible /v1/chat/completions endpoint instead? It seems the built-in API isn't versioned.

yiding · 2024-01-12T23:27:33Z

It seems like #4232 changed the behavior of prompt parameter such that when given an array, instead of interpreting the array as a single prompt defined using an array, it's interpreted as multiple prompts to be evaluated in parallel. This change is not documented in the example server README, which still refers to the old behavior.

This means that it's impossible to use the current code to pass in a single completion request with a prompt that contains an array with multiple elements. This means that you can no longer:

pass in an array of tokens
pass in an array of (strings and tokens).

In my case, I wanted to do this because I wanted to intersperse text and raw tokens in the prompt, without making tokenize requests first to tokenize the text.

I think at the very least the new behavior should be documented. Moreover, I think overloading this behavior to the prompt parameter is confusing since it changes the interpretation of the field, and shadows behaviors. Perhaps it would make more sense to have a different parameter such as prompts (plural) for multi-prompt mode (and make it mutually exclusive to prompt (singular))

cc @ziedbha who originally added this.

jboero · 2024-01-12T23:31:43Z

Excellent thank you for confirming my issue.

ziedbha · 2024-01-13T01:01:49Z

Apologies, I haven't had time to look at this properly. What I can do is revert to the old behavior to unblock you, and revisit this later. Sorry again.

yiding · 2024-01-13T01:07:37Z

Commenting out this block is sufficient to restore old behavior.

llama.cpp/examples/server/server.cpp

Lines 1408 to 1413 in de473f5

    
           // when a completion task's prompt array is not a singleton, we split it into multiple requests 
        
           if (task.data.count("prompt") && task.data.at("prompt").size() > 1) 
        
           { 
        
               lock.unlock(); // entering new func scope 
        
               return split_multiprompt_task(task); 
        
           }

Personally I can just work off of a local commit, so no need to do any reverting on my account, not sure about @jboero

jboero · 2024-01-13T01:30:07Z

No need to revert if this is the way it's meant to be. Or maybe version the API to support both?
Or guidelines should recommend using the ChatGPT-compatible endpoint instead? To be fair what started out as a simple example is maybe exceeding the purposes it was originally built for.

ggerganov · 2024-01-13T14:08:49Z

To be fair what started out as a simple example is maybe exceeding the purposes it was originally built for.

It is. It would be great to have a fully functional server with all the bells and whistles, but maintaining it requires extra effort - even more so without a way to create a CI with some simple tests. I'm planning to refactor a lot of the code, but it's not really high-priority atm since there are more important things to be done for the core lib first.

Hopefully the community will help out in the meantime (#4216)

jboero · 2024-01-13T19:49:47Z

I'm happy to help maintain and expand this. I think it's one of the simplest and greatest bits of code I've ever used.

…

On Sat, Jan 13, 2024, 08:09 Georgi Gerganov ***@***.***> wrote: To be fair what started out as a simple example is maybe exceeding the purposes it was originally built for. It is. It would be great to have a fully functional server with all the bells and whistles, but maintaining it requires extra effort - even more so without a way to create a CI with some simple tests. I'm planning to refactor a lot of the code, but it's not really high-priority atm since there are more important things to be done for the core lib first. Hopefully the community will help out in the meantime (#4216 <#4216>) — Reply to this email directly, view it on GitHub <#4676 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZP3DDROT4L3LJZZE5KCF3YOKIPZAVCNFSM6AAAAABBF74ZAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJQGQ3DQMJSGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jboero · 2024-01-22T16:26:06Z

Closing this for now as it seems this was a feature change.

jboero added the bug-unconfirmed label Dec 28, 2023

jboero closed this as completed Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples/Server API change breaks prompt as JSON array. #4676

Examples/Server API change breaks prompt as JSON array. #4676

jboero commented Dec 28, 2023

yiding commented Jan 12, 2024 •

edited

Loading

jboero commented Jan 12, 2024

ziedbha commented Jan 13, 2024

yiding commented Jan 13, 2024

jboero commented Jan 13, 2024

ggerganov commented Jan 13, 2024

jboero commented Jan 13, 2024 via email

jboero commented Jan 22, 2024

Examples/Server API change breaks prompt as JSON array. #4676

Examples/Server API change breaks prompt as JSON array. #4676

Comments

jboero commented Dec 28, 2023

yiding commented Jan 12, 2024 • edited Loading

jboero commented Jan 12, 2024

ziedbha commented Jan 13, 2024

yiding commented Jan 13, 2024

jboero commented Jan 13, 2024

ggerganov commented Jan 13, 2024

jboero commented Jan 13, 2024 via email

jboero commented Jan 22, 2024

yiding commented Jan 12, 2024 •

edited

Loading