-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples/Server API change breaks prompt as JSON array. #4676
Comments
It seems like #4232 changed the behavior of This means that it's impossible to use the current code to pass in a single completion request with a prompt that contains an array with multiple elements. This means that you can no longer:
In my case, I wanted to do this because I wanted to intersperse text and raw tokens in the prompt, without making tokenize requests first to tokenize the text. I think at the very least the new behavior should be documented. Moreover, I think overloading this behavior to the cc @ziedbha who originally added this. |
Excellent thank you for confirming my issue. |
Apologies, I haven't had time to look at this properly. What I can do is revert to the old behavior to unblock you, and revisit this later. Sorry again. |
Commenting out this block is sufficient to restore old behavior. llama.cpp/examples/server/server.cpp Lines 1408 to 1413 in de473f5
Personally I can just work off of a local commit, so no need to do any reverting on my account, not sure about @jboero |
No need to revert if this is the way it's meant to be. Or maybe version the API to support both? |
It is. It would be great to have a fully functional Hopefully the community will help out in the meantime (#4216) |
I'm happy to help maintain and expand this. I think it's one of the
simplest and greatest bits of code I've ever used.
…On Sat, Jan 13, 2024, 08:09 Georgi Gerganov ***@***.***> wrote:
To be fair what started out as a simple example is maybe exceeding the
purposes it was originally built for.
It is. It would be great to have a fully functional server with all the
bells and whistles, but maintaining it requires extra effort - even more so
without a way to create a CI with some simple tests. I'm planning to
refactor a lot of the code, but it's not really high-priority atm since
there are more important things to be done for the core lib first.
Hopefully the community will help out in the meantime (#4216
<#4216>)
—
Reply to this email directly, view it on GitHub
<#4676 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZP3DDROT4L3LJZZE5KCF3YOKIPZAVCNFSM6AAAAABBF74ZAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJQGQ3DQMJSGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Closing this for now as it seems this was a feature change. |
My app uses the examples/server API /completions endpoint and recently it stopped working giving "no slots available" errors even though a slot is clearly available. It seems like one slot is required per item in the prompt array. Also now instead of a simple message content, streaming results are provided as an object containing an array of results.
Was there a documented change for this? Maybe it's better to use the ChatGPT-compatible /v1/chat/completions endpoint instead? It seems the built-in API isn't versioned.
The text was updated successfully, but these errors were encountered: