-
Notifications
You must be signed in to change notification settings - Fork 369
Use llm_samplers crate for sampler backend #359
Conversation
Random extra stuff: The next thing I'm thinking of adding to Requests for adding other samplers not yet implemented (I actually am not aware of any) or API changes are welcome. I also don't have a very good track record for maintain projects after the fun adding fancy stuff part is over. I still have some more stuff I want to do, but after that if Rustformers wants to get added as a co-owner for |
This is an exciting development. I've been looking forward to experimenting with Mirostat V2, and this should make it quite effortless.
It might be a good idea to approach @philpax about incorporating this under the
To determine the vocabulary size, we either need to extract the hyperparameters from the model or load the
Unit testing samplers might prove a bit challenging, especially once we begin to chain them. My only concern with the current implementation is the significant increase in CLI parameters. However, I suppose it's a necessary trade-off if we're to support multiple different samplers via the command line. |
It's usually nothing as definite as that, I just get interested in other stuff. I also don't really want to completely give up control of my project. I guess what I'm talking about is more like giving
Right. The problem as far as I could see is that information isn't available at the time the samplers are getting set up. There also isn't really a convenient way to change it later on since the This shouldn't be too much of a blocker. In the worst case,
Well, as long as the individual samplers work correctly, they should also work when chained (as long as the chain is meaningful, of course). So what I need is more comprehensive tests for the individual samplers. You can take a look at the existing tests in the
I agree. I basically just copied the One thing that might make it a bit more manageable is just to add something like a Another approach might be to have something like a I was actually thinking about adding |
Fantastic work! Apologies about not getting back to you before, work got in the way.
That's totally fine. I'm happy keeping it as a dependency - if/when you want to move on, let us know and we can organise something, but I'm not fussed either way. You've done great work so far.
Yep, you can leave it as-is for now. We'll figure something out - probably revising the sampler interface to pass in the vocabulary size or something. (You could also do that, if you wanted.)
I'd suggest making tests with the current output for a given input, and then revising them if they turn out to be wrong. As you say, we don't really know what the "correct" behaviour is, so the most we can do is make sure it doesn't regress. That's what we've done with our integration tests so far - better to have something rather than nothing...
I think that's the nicest approach that comes to mind. The many CLI options will lead to a confusing user experience, and it won't allow for chaining samplers. I would have liked to have The only point of concern is that the discoverability of the sampler parameters on the CLI might be limited - it might end up looking like a ffmpeg command - but we can hopefully address that with documentation. I'm happy to merge once the CLI sampler parameters are reduced. Two other questions:
|
One thing I was thinking is a fairly easy solution is to just implement my sampler trait for I think that may be the easiest way to deal with the situation.
There are tests for every type of sampler already. So the issue is more that there need to be tests that actually prove stuff is functioning correctly.
Do you have any idea in mind of what you'd like it to look like? If you show me an example of what you think is the ideal format for specifying the sampler params I can probably add something to parse it that way. (Or tell you why it won't work, if there's some issue.)
It doesn't need to be kept around however I wouldn't really suggest removing it until you're really confident the new samplers are working the way you expect/hope. I wrote the Rust versions from looking at the llama.cpp and trying to understand what it was doing to the data. I didn't write them from a deep understanding of how the samplers themselves work. I did port the tests from llama.cpp as well (and my versions pass them), however they aren't exactly very rigorous and I know of at least one example of a bug on the llama.cpp samplers that the existing tests didn't catch. Anyway, that's why I took a pretty conservative approach with adding the
Yes, that's correct. Like you said, it could be moved out into the main scope if the existing sampler was removed (or even if it wasn't.) |
Yes, I was also considering that, or extending the trait with a method that provides more information to the sampler post facto. I'm fine with either, but I have a slight preference for the latter because I assume the former would involve downcasting for Mirostat1. If it doesn't, ignore me. ...that being said, unless I'm mistaken, the model already exists and has been loaded by the time
Right, yeah, not sure we can offer much help there. A lot of the work in the wider field is cowboy engineering, so there aren't really any existing test cases that we can use that I'm aware of. The original papers might have something, but I'd expect their results to be more summaries than specific scenarios.
Not at all. I was nodding approvingly at the example you gave here:
Something like that with a reasonable set of defaults and some documentation would be fine by me. I can't think of any improvements to that at this time; I suspect any such improvements will shake out of use with the interface, and that we won't be able to guess at it before then.
Honestly... see my previous comment about cowboy engineering. If there's a bug with You have my full permission to delete the old sampler and go all-in on Part of the reason I'd prefer to remove the old sampler(s) entirely is because, well, it's incredibly confusing to have |
I'm working on revisions based on the conversation here but it's going to take a few days. Probably Thursday or Friday. |
Unfortunately this has taken a lot longer than expected. The changes on the I still have some testing and cleanup work to do and then I need to port (or maybe just reimplement) this pull and fix the merge conflicts. I doubt I'll finish all that tomorrow, but possibly Monday. Anyway, I haven't forgotten about this pull even though there hasn't been any visible progress until now. |
No worries, take your time. Appreciate the hard work! |
95fbf90
to
0b9ac6b
Compare
Here's a question: Currently everything else in Depending on whether it's expected that struct would be used for other parameters in the future it could make sense to rename it to something like Note, this pull has the merge conflicts resolved but hasn't been updated to the new version of |
The removal of the parameters in the |
What do we think of this general approach? Note: There's still a considerable amount of cleanup that needs to be done before it should be merged. Right now, running inference does seem to work. I tried to hide as much of the implementation details in One thing that makes things complex is The general description of how it works is Anyway, if the consensus is that this approach is acceptable I'll continue with cleaning it up and getting it mergeable. Otherwise I guess we'll see! edit: This is now reasonably clean. |
One thing I'm not so sure about is how stuff should look from an API standpoint or what exactly the best location for some structure is. Assuming the general approach is okay, it might make sense to merge this and then have someone who actually has a vision for how that stuff is supposed to fit together just rename/reorganize stuff. Not that I'm unwilling to make that kind of change if requested, I just don't really have the vision/familiarity to necessarily make the optimal choices for that kind of stuff. |
Great work, I'll review this in the next day or two and get back to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but ConfiguredSamplers
being many Option
s is a little odd. Would it be possible to use a type-erased array instead?
Thanks for the review.
Unfortunately not, at least with the approach I'm using. If it's not clear, that's a temporary structure used to hold the configured samplers before they're built into a chain (which is the type-erased array, basically or you can just think of it as This is how it works currently:
Also if it's not clear, those first two structs are only used very temporarily during initialization and results in an actual (BTW, don't merge this yet even if you're okay with it after this explanation. I still have a bit more cleanup to do and I'd want to release a version of |
Gotcha, I understand. Looking forward to it! |
From that response, I'm assuming you didn't require any additional changes like moving stuff around or renaming it. This latest set of changes is mainly just cleanups and improvements to the commandline help text. (I added more information, sorted it alphabetically, etc.) This should be ready to merge now (I'm assuming the automated checks will pass). Just want to add again though that correct behavior for the samplers hasn't really been tested extensively (and I don't really have the information to know exactly what would be "correct"). If you find anything not working as expected, please let me know. |
194f671
to
ec9052e
Compare
Not sure what was up with this pull since unless I misunderstood there weren't any further changes required. However, in the meantime I think I came up with a better approach to the I've also added another sampler that I had an idea for. I'll just paste the rustdoc description: Sequence RepetitionThis sampler penalizing repeating sequences of tokens that have already been seen within the Disregarding If Warning: Very alpha code, likely has significant bugs. Properties:
Parameters:
I'm not sure if you'd want to include this as well, but I can if there's interest. I'm almost positive the matching algorithm has a few bugs but it may still work well enough to be useful (seems to in my tests). This is an approach I came up with myself so there isn't a reference implementation. The advantage over stuff like the existing repetition and frequency/presence samplers is that they don't take context into account so you can stop specific tokens from getting repeated but this is a pretty blunt instrument. Also, penalizing common short words like "if", "and", "or" because they get repeated frequently can cause some strange effects. |
I guess @philpax is currently a bit bussy and working on the GGUF standard and i don't feel comfortable merging this, so i guess we have to wait a bit. And this PR is still marked as a draft 😅
I think we can include it, as its an optional sampler. But maybe we don't have to pass all parameters to the cli and just keep it for other people to use in their projects if they want to 🤔 |
Hey! Sorry - as Lukas said, I've been really busy and have only merging maintenance PRs. What you've described sounds great and I've no problems with it. I can try getting this in on the weekend, or I can wait for you to play with that. Just let me know which one works best for you. |
This weekend would be fine, I think I can have my changes ready by early Sunday (or hopefully sooner).
Well, I think this would mainly be a question of whether it should be in the commandline options (or default set of samplers, but at least with approach I took that's essentially the same thing). A sampler can be a chain of samplers from Also, I'm not sure if supporting the sampler via CLI but leaving out some of the options it supports would really be that useful. For some samplers, like Mirostat there are a few options that it's very unlikely people would want to change directly. All the options in Sequence Repetition are likely to be things people would want to tweak. The length of matches, tolerance to partial matches, the window which watching occurs. I know I'd use all those. I'd actually really like to figure out a way to allow specifying multiple samplers of the same kind with llm ( Anyway, for this part (just the SR sampler) I could still use some direction since it's not completely clear what is desired. In a perfect world, I'll also get time to fix all potential issues there and be satisfied it will work pretty reliably but that's definitely not guaranteed to happen before the weekend. |
Okay, I think we're finally there. I was hoping to have this ready early today but these were significant and complex changes on the I had to make creating It's possible to specify It's also possible to add more than one instance of samplers where that makes sense ( You'll get a reasonable error if you specify incompatible sampler settings like Locally Typical + Mirostat:
Note: It's possible to use the Sequence Repetition sampler but it doesn't currently appear in the help text. It wasn't clear what was desired regarding that. I can add it to the help text if you want, otherwise it's basically a hidden option that people who know about it can try out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work, I'm a big fan of the approach. Ready to merge, just have a few things I'd like changed:
- re-export
llm-samplers
or its contents somewhere, so that library users can construct sampler chains / name the relevant types without an additional dependency onllm-samplers
- document what the default sampler chain does
- consider exporting
ConfiguredSamplers
and/or renamingbuild_sampler
to indicate that it is specifically parsing a list of strings to construct a sampler chain fromConfiguredSamplers
. At present, it's not clear that there are non-string ways to construct a sampler chain, andbuild_sampler
looks like the only way.- Another thought that comes to mind is to move all of the parsing logic (including
build_sampler
) intollm-cli
- as there are currently CLI-specific concerns (like thebias
) in there, but library users might want to use the same parsing logic. Maybe removebuild_sampler
entirely, inline its logic into the CLI (but keepConfiguredSamplers
public), and require thatConfiguredSamplers
is constructed withn_vocab
to handle the Mirostat case. - It may also be prudent to define a
thiserror
error type forConfiguredSamplers
, so that library users can handle and report construction errors as they please.
- Another thought that comes to mind is to move all of the parsing logic (including
Otherwise, I'm really happy with this and l'm looking forward to merging it. Having all of these different samplers and being able to combine them will be awesome!
Expose the sampler configuration structures to allow more flexibility. Add more documentation and description for the sampling functions and structures. Create specific enums for sampler construction and sampling errors. Set n_vocab for the Mirostat 1 sampler in a more reliable way.
I think these changes should address most (hopefully all!) of your concerns. I went the route of just exposing the |
Fantastic, thank you. Yeah, that pretty much covers it all - I'm still not sure about the name of Great work on putting all of this together, it's much appreciated. |
I'm not the best at naming things, and also I don't really have good familiarity with the Or if you want to give me specific directions for where stuff should be/how it should be named I can also do that along with necessary minor refactoring like imports and submit another pull. (Not really sure if that's easier than doing it yourself, but thought I'd offer.) |
Yeah, no stress at all - I'd like to give the whole API a holistic makeover because it's desperately in need of a cleanup, anyway. I'll let you know if I need a hand or need something clarified, but from what I've seen so far, I shouldn't have any problem figuring it out. |
Now that this has been merged for a while, any issues/feedback? Doesn't seem like there were any actual issues created related to bugs or problems with the samplers. I created a discussion about the sequence repetition sampler over here: ggerganov/llama.cpp#2581 I still haven't really figured out a really good approach for it. Any thoughts on that would also be welcome (I can create a discussion or issue in this repo also). |
It's been pretty quiet user-wise and I haven't had much opportunity to work on I'd be quite interested in the SRS (I've seen the problem it aims to solve quite a few times now in conversational contexts), but I'm also not sure what the best way to solve it is. I'll think about it and post to that discussion if I think of anything. |
@KerfuffleV2 i had some time to play around with the samplers but creating a custom sampler chain is kinda awkward. The easiest way to get it working with // Yup, this is awful. But it works for now.
let sampler_string = format!("repetition:last_n={last_n}:penalty={penalty}/topk:k={top_k}/topp:p={top_p}/temperature:temperature={temperature}",
last_n = self.repetition_penalty_last_n,
penalty = self.repetition_penalty,
top_k = self.top_k,
top_p = self.top_p,
temperature = self.temperature
);
let sampler_config = &[sampler_string];
let sampler = llm_base::samplers::build_sampler(0,Default::default(),sampler_config).unwrap(); Whats the intended way to build a simple top_k, top_p, temperature sampler chain? |
Were the examples in the docs not the kind of thing you're looking for? https://docs.rs/llm-samplers/ — the main doc page there also has suggested ordering/combinations at the bottom based on how llama.cpp does it. There are also some examples in Here's how I'd write your chain explicitly: let mut sc = SamplerChain::<u32, f32>::new()
+ SampleRepetition::new(self.repetition_penalty, self.repetition_penalty_last_n)
+ SampleTopK::new(self.top_k, 1)
+ SampleTopP::new(self.top_p, 1)
+ SampleTemperature::new(self.temperature)
+ SampleRandDistrib::new(); (Note, not tested: you may need to add a few type annotations.) Rather than using Please let me know if that didn't answer your question or you still don't like this approach. |
@KerfuffleV2 Sometimes i'm baffled by my own stupidity.😅 Well, your examples were exactly what i was looking for, maybe we should include some commonly used setups as preconfigured samplers into rustformers 🤔 |
Haha, no need to say anything like that. There wasn't really much documentation in the pull about using
Or even in Should probably just mention: I'm very likely to make breaking changes in |
Warning: This is very lightly tested (but appears to work).
This pull adds optional (and off by default) integration with my
llm-samplers
crate ( https://crates.io/crates/llm-samplers ) which supports building modular sampler chains and includes all the samplers thatllama.cpp
currently supports.I tried to implement this in a non-invasive way.
The pull adds support for:
Of course it supports all the existing sampler types as well.
Caveats: Right now using Mirostat v1 is really awkward because it needs to know the model vocabulary size but it doesn't seem like that information is available at the time the samplers are constructed. I made it a commandline option, but it would obviously be a lot better if the right value was automatically supplied since we do have that information (eventually).
One thing I could use help with is tests for samplers. Not necessarily even code, just "Given X, Y, Z parameters to sampler A, we expect result B".
llm-samplers
does pass all the tests from llama.cpp but I don't have much faith in this signifying everything works perfectly.Closes #318