-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control vectors in server #6289
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to see control vectors implemented on the server. Thank you for that. But let's keep it it simple, no need for the get and post /vectors
route at runtime.
Also, please implement the control_vectors.feature
in the server test framework within the PR.
Then, we need to assess the impact of control vectors on the server performances.
I couldn't have much time to look into details, but can you try:
|
Yeah, that's what I initially tried. Originally the So it seems something has to be cleaned up / reinitialized when wanting to apply a different control vector later. But I don't know what exactly. I just don't understand all that lower-level stuff that's going on there. So I simply tried this brutish method in my last commit to see what happens. Removing the assert guards and forcefully calling Loading control vectors with the guards in place (no repeated call of
But swapping out control vectors live and being able to change them akin to samplers was my main motivation. |
Sorry, I have no idea how that works, and when I try to install the python requirements I get errors thrown at me:
Maybe you could take over from here? I'm afraid of snakes. |
@phymbert I just noticed one peculiar behaviour: Sometimes the control vector has no effect. I think that happens when a request comes in before the initialization of the model, context + vector is completed. (The server does not respond in that case, but it affects its behaviour later on when everything is initialized and requests are served: The control vector specified through command line argument just has no effect then.) |
No sorry, IMHO adding this test is straight forward: add a new step and new server params and copy a completion scenario. For python, just create a python venv. |
Reopening this PR again. But some anon suggested I should simply provide a good argument why this should be merged as is. I see no fundamental difference here for these control vector arguments and why they should require special testing in server.cpp when the functionality is the same in main.cpp. So take it or leave it. I think it makes sense to just merge this. |
And about my original intention of adding hot-reloading for control vectors in server.cpp: |
I am sorry you feel it this way, adding this feature is great, but we aim to provide a production ready server, not a playground. A feature is nothing without proper test implementation in order to maintain it. You cannot ask people to do tests for you. As you do not "care enough", I am adding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is any damage to adding a command line option to the server that already exists everywhere else on llama.cpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov Georgi, AFAIK there is no emergency to support control vectors in the server. I strongly believe adding feature even few lines of code without testing them has an impact on the overall long term software quality.
If we aim to develop an efficient LLM serving solution, testing is mandatory.
Either the test framework is not adapted and one should think to replace it with something more appropriate for the community either we should maintain it.
Also adding a test scenario will help people to understand how to configure the feature and what is expected.
If the author is not welling to do the test: I may have a look the day I need control vectors. Although it's not a real passion to test other people development.
I have added for traceability as good first issue:
Yes, there is no rush to add this functionality to |
Thanks for your understanding. Let's discuss this again if someone really needs this to master one day. Meanwhile, the linked issue is opened for contributions. |
ef194c4
to
2506fed
Compare
Okay. Here is all I've got: Anon shared a patch here that fixes the memory allocation/freeing bugs in To address the security concern about accessing arbitrary control vector files through that hot-swapping endpoint, it is now necessary to provide predefined options for control vectors that can be loaded:
The server will decide based on the specified path string whether it is interpreted as a vector file or a directory. A slash will be automatically added to the name of a directory option. Hot-reloading works the same as before, but without segfaults. That is all I'm willing and able to do and what I will be using myself. If someone likes to fix it up with all the necessary compliance bullshit to merge it into master that would of course be very nice. Don't forget to give credit to [email protected] for his contributions of commits 181879f, 9914014, and d0304f7. |
I'm curious, how does this deal with parallel execution? I assume that the applied control vector affects all "conversations" in a multi-user context? |
Yes, it does. I guess it would be possible to decouple user settings for control vectors by adding control vector options to the completion endpoint instead of just this "global" control vector endpoint. It looks a bit complicated because of the async stuff with the request queue going on in |
Control vectors are context-independent, because it is applied to the embedding vector between layers, not the self-attention mechanism. |
@trollkotze Thanks for picking this up (and catching the memory bugs in my original implementation 😅) I've been on vacation so haven't been checking progress, but I'm experienced w/ Python, would be happy to coordinate to get this over the line for merging! (I'm @vgel on Discord if you use it.) |
@vgel Thanks! That would be great if you could help out here and do the python stuff or whatever else there is to do. |
@phymbert I also invited you to my fork as collaborator in case you feel motivated and want to fiddle with it there. |
What's the pending work on this, just test cases, or anything more? |
Could we not separate off the hot-swapping part into a 2nd PR and start with just having a command line option to load the control vector at launch time? This would allow more testing for the more risky stuff but still let people use the control vectors now. |
My nitpicks/comments/questions which may or may not make sense:
|
This PR implements options for loading control-vectors in server.cpp same as it already exists in main.cpp on start-up (only adding cli options here for what was already implemented), as well as hot-swapping control vectors at runtime in server.cpp.
The PR also includes a fix for a memory allocation/freeing bug in
llama_load_control_vector
. Credit for that goes to [email protected] for his contributions of commits 181879f, 9914014, and d0304f7.New CLI options
In addition to the same CLI options that exist for main.cpp there are these additional options for server.cpp to specify named file or directory options from which control vectors can be loaded at runtime:
--control-vector-option name /path/to/vector.gguf
--control-vector-option name path/to/dir
--control-vector-option name path/to/dir/
The server will decide based on the specified path string whether it is interpreted as a vector file or a directory. A slash will be automatically added to the name of a directory option.
New endpoints for the server
GET /control-vector-options
will return a json array of strings.So e.g.
--control-vector-option vectors path/to/dir --control-vector-option victor path/to/some/vector.gguf
will result in the listed options:["vectors/", "victor"]
, meaning that the user would only be allowed to loadvectors/anything/goes/here
(except path traversal) orvictor
.GET /control-vectors
will return the currently applied control vector combination and layer range parameters (the latter can only be specified on load for now, or will default to full range) like so:POST /control-vectors
accepts an array of vector parameters in the same format, which will replace the currently applied control vector:Note that the .gguf here is optional for vectors inside allowed directories. It will be added to the file name to be loaded if not included. The assumption is that all control vector files end in
.gguf
.Path traversal is guarded against to the extent I am aware of.
The original message below here is outdated and just left for historical context: