Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Use llm_samplers crate for sampler backend #359

Merged
merged 3 commits into from
Aug 6, 2023

Conversation

KerfuffleV2
Copy link
Contributor

Warning: This is very lightly tested (but appears to work).

This pull adds optional (and off by default) integration with my llm-samplers crate ( https://crates.io/crates/llm-samplers ) which supports building modular sampler chains and includes all the samplers that llama.cpp currently supports.

I tried to implement this in a non-invasive way.

The pull adds support for:

  • Mirostat v1 and v2 samplers
  • Locally typical sampling
  • Tail free sampling
  • Frequency and presence penalties

Of course it supports all the existing sampler types as well.

Caveats: Right now using Mirostat v1 is really awkward because it needs to know the model vocabulary size but it doesn't seem like that information is available at the time the samplers are constructed. I made it a commandline option, but it would obviously be a lot better if the right value was automatically supplied since we do have that information (eventually).

One thing I could use help with is tests for samplers. Not necessarily even code, just "Given X, Y, Z parameters to sampler A, we expect result B". llm-samplers does pass all the tests from llama.cpp but I don't have much faith in this signifying everything works perfectly.

Closes #318

@KerfuffleV2
Copy link
Contributor Author

Random extra stuff:

The next thing I'm thinking of adding to llm-samplers is grammar based sampling. And by add, I mean totally rip off ggerganov/llama.cpp#1773

Requests for adding other samplers not yet implemented (I actually am not aware of any) or API changes are welcome.

I also don't have a very good track record for maintain projects after the fun adding fancy stuff part is over. I still have some more stuff I want to do, but after that if Rustformers wants to get added as a co-owner for llm-samplers I'm open to that.

@LLukas22
Copy link
Contributor

LLukas22 commented Jul 7, 2023

This is an exciting development. I've been looking forward to experimenting with Mirostat V2, and this should make it quite effortless.

I also don't have a very good track record for maintaining projects after the fun adding fancy stuff part is over. I still have some more stuff I want to do, but after that if Rustformers wants to get added as a co-owner for llm-samplers I'm open to that.

It might be a good idea to approach @philpax about incorporating this under the rustformers banner if you find that you no longer wish to work on it.

Caveats: Right now using Mirostat v1 is really awkward because it needs to know the model vocabulary size but it doesn't seem like that information is available at the time the samplers are constructed. I made it a commandline option, but it would obviously be a lot better if the right value was automatically supplied since we do have that information (eventually).

To determine the vocabulary size, we either need to extract the hyperparameters from the model or load the tokenizers.json if we're employing Hugging Face's tokenizers. Perhaps we could consider constructing the sampler after the model/tokenizer is loaded.

One thing I could use help with is tests for samplers. Not necessarily even code, just "Given X, Y, Z parameters to sampler A, we expect result B". llm-samplers does pass all the tests from llama.cpp but I don't have much faith in this signifying everything works perfectly.

Unit testing samplers might prove a bit challenging, especially once we begin to chain them.

My only concern with the current implementation is the significant increase in CLI parameters. However, I suppose it's a necessary trade-off if we're to support multiple different samplers via the command line.

@KerfuffleV2
Copy link
Contributor Author

incorporating this under the rustformers banner if you find that you no longer wish to work on it.

It's usually nothing as definite as that, I just get interested in other stuff. I also don't really want to completely give up control of my project. I guess what I'm talking about is more like giving rustformers an alternative to having to fork it once we get to that point. This is probably a while in the future, since I have plans for fun stuff to add like grammar-based sampling (which looks really interesting/useful).

To determine the vocabulary size, we either need to extract the hyperparameters from the model

Right. The problem as far as I could see is that information isn't available at the time the samplers are getting set up. There also isn't really a convenient way to change it later on since the SamplerChain is basically a Vec<dyn Sampler> and since it got type erased you don't really even know which item is the mirostat v1 sampler.

This shouldn't be too much of a blocker. In the worst case, llm could elect to just not include mirostat v1. It seems like most people prefer v2 anyway (although personally I've seen better results with v1).

Unit testing samplers might prove a bit challenging, especially once we begin to chain them.

Well, as long as the individual samplers work correctly, they should also work when chained (as long as the chain is meaningful, of course). So what I need is more comprehensive tests for the individual samplers. You can take a look at the existing tests in the llm-samplers repo to see how that looks: they're pretty easy to test, the only thing I'm lacking is knowing what the correct outputs are for a certain input.

My only concern with the current implementation is the significant increase in CLI parameters.

I agree. I basically just copied the llama.cpp approach. If you have ideas for a better approach, I'm certainly open to making changes.

One thing that might make it a bit more manageable is just to add something like a --sampler- prefix to the sampler related options so they're all together. (Or I don't know, maybe clap has better ways to group options like that?)

Another approach might be to have something like a --sampler option that takes an argument which is just general stuff relating to samplers. Like --sampler temperature:0.8 --sampler mirostat1:lr=0.1,ent=5.0 (doesn't have to be like that, just an example of what that approach could look like).

I was actually thinking about adding FromStr instances for the samplers in the llm-samplers crate to allow constructing them from string input.

@philpax
Copy link
Collaborator

philpax commented Jul 9, 2023

Fantastic work! Apologies about not getting back to you before, work got in the way.

I also don't really want to completely give up control of my project. I guess what I'm talking about is more like giving rustformers an alternative to having to fork it once we get to that point.

That's totally fine. I'm happy keeping it as a dependency - if/when you want to move on, let us know and we can organise something, but I'm not fussed either way. You've done great work so far.

Right. The problem as far as I could see is that information isn't available at the time the samplers are getting set up. There also isn't really a convenient way to change it later on since the SamplerChain is basically a Vec and since it got type erased you don't really even know which item is the mirostat v1 sampler.

Yep, you can leave it as-is for now. We'll figure something out - probably revising the sampler interface to pass in the vocabulary size or something. (You could also do that, if you wanted.)

You can take a look at the existing tests in the llm-samplers repo to see how that looks: they're pretty easy to test, the only thing I'm lacking is knowing what the correct outputs are for a certain input.

I'd suggest making tests with the current output for a given input, and then revising them if they turn out to be wrong. As you say, we don't really know what the "correct" behaviour is, so the most we can do is make sure it doesn't regress. That's what we've done with our integration tests so far - better to have something rather than nothing...

Another approach might be to have something like a --sampler option that takes an argument which is just general stuff relating to samplers. Like --sampler temperature:0.8 --sampler mirostat1:lr=0.1,ent=5.0 (doesn't have to be like that, just an example of what that approach could look like).

I think that's the nicest approach that comes to mind. The many CLI options will lead to a confusing user experience, and it won't allow for chaining samplers.

I would have liked to have clap handle this for us somehow, but as far as I can tell it doesn't handle having variants as arguments - only enums with no payload - so we're forced to handle the parsing ourselves.

The only point of concern is that the discoverability of the sampler parameters on the CLI might be limited - it might end up looking like a ffmpeg command - but we can hopefully address that with documentation.

I'm happy to merge once the CLI sampler parameters are reduced.


Two other questions:

  1. Does the existing sampler need to be kept around with the new samplers being made available? I'm entirely fine with removing it - we're preparing for an interface break for 0.2 anyway.
  2. What's the reason for keeping the integration in a module? Just making it more visually obvious what its scope is? (I ask because if we address 1), we can probably remove the module too)

@philpax philpax added the issue:enhancement New feature or request label Jul 9, 2023
@KerfuffleV2
Copy link
Contributor Author

KerfuffleV2 commented Jul 9, 2023

Yep, you can leave it as-is for now.

One thing I was thinking is a fairly easy solution is to just implement my sampler trait for Box<dyn Sampler> and maybe Arc<Mutex<dyn Sampler>> as well (that's the trait from llm-samplers, not the one in llm already). That way, if there's a Mirostat1 sampler one could possibly just save it in a struct and update it later on. Since the Box/Arc conforms to the required trait it can also be in a the sampler chain.

I think that may be the easiest way to deal with the situation.

I'd suggest making tests with the current output for a given input, and then revising them if they turn out to be wrong.

There are tests for every type of sampler already. So the issue is more that there need to be tests that actually prove stuff is functioning correctly.

I'm happy to merge once the CLI sampler parameters are reduced.

Do you have any idea in mind of what you'd like it to look like?

If you show me an example of what you think is the ideal format for specifying the sampler params I can probably add something to parse it that way. (Or tell you why it won't work, if there's some issue.)

Does the existing sampler need to be kept around with the new samplers being made available?

It doesn't need to be kept around however I wouldn't really suggest removing it until you're really confident the new samplers are working the way you expect/hope. I wrote the Rust versions from looking at the llama.cpp and trying to understand what it was doing to the data. I didn't write them from a deep understanding of how the samplers themselves work.

I did port the tests from llama.cpp as well (and my versions pass them), however they aren't exactly very rigorous and I know of at least one example of a bug on the llama.cpp samplers that the existing tests didn't catch.

Anyway, that's why I took a pretty conservative approach with adding the llm-samplers support and left the old sampler in also. This also enables testing/comparing with the existing sampler without having to do stuff like switch between versions.

What's the reason for keeping the integration in a module? Just making it more visually obvious what its scope is?

Yes, that's correct. Like you said, it could be moved out into the main scope if the existing sampler was removed (or even if it wasn't.)

@philpax
Copy link
Collaborator

philpax commented Jul 9, 2023

One thing I was thinking is a fairly easy solution is to just implement my sampler trait for Box<dyn Sampler> and maybe Arc<Mutex<dyn Sampler>> as well (that's the trait from llm-samplers, not the one in llm already). That way, if there's a Mirostat1 sampler one could possibly just save it in a struct and update it later on. Since the Box/Arc conforms to the required trait it can also be in a the sampler chain.

Yes, I was also considering that, or extending the trait with a method that provides more information to the sampler post facto. I'm fine with either, but I have a slight preference for the latter because I assume the former would involve downcasting for Mirostat1. If it doesn't, ignore me.

...that being said, unless I'm mistaken, the model already exists and has been loaded by the time inference_parameters gets called. Is there a reason we can't get the tokenizer vocabulary size from the model and pass that into inference_parameters?

There are tests for every type of sampler already. So the issue is more that there need to be tests that actually prove stuff is functioning correctly.

Right, yeah, not sure we can offer much help there. A lot of the work in the wider field is cowboy engineering, so there aren't really any existing test cases that we can use that I'm aware of. The original papers might have something, but I'd expect their results to be more summaries than specific scenarios.

Do you have any idea in mind of what you'd like it to look like?

Not at all. I was nodding approvingly at the example you gave here:

Another approach might be to have something like a --sampler option that takes an argument which is just general stuff relating to samplers. Like --sampler temperature:0.8 --sampler mirostat1:lr=0.1,ent=5.0 (doesn't have to be like that, just an example of what that approach could look like).

Something like that with a reasonable set of defaults and some documentation would be fine by me. I can't think of any improvements to that at this time; I suspect any such improvements will shake out of use with the interface, and that we won't be able to guess at it before then.

It doesn't need to be kept around however I wouldn't really suggest removing it until you're really confident the new samplers are working the way you expect/hope.

Honestly... see my previous comment about cowboy engineering. If there's a bug with llm-samplers or the integration, we'll fix it when we encounter it. I'd rather commit to standardising all the samplers under one banner than to maintain two slightly different interfaces that users have to contend with.

You have my full permission to delete the old sampler and go all-in on llm-samplers here. Heck, you can probably remove the Sampler trait entirely; that doesn't exist in any released versions, so the cost of replacing it with your trait is relatively minimal. As long as it's easy enough for users to define their own samplers, I'm not that fussed.

Part of the reason I'd prefer to remove the old sampler(s) entirely is because, well, it's incredibly confusing to have LlmSamplersSampler and Sampler and LlmRsSamplerResources (especially if you were to look at the docs.) If we had to pick one, I'd rather go with the one that's already designed to be modular out of the box!

@KerfuffleV2
Copy link
Contributor Author

I'm working on revisions based on the conversation here but it's going to take a few days. Probably Thursday or Friday.

@philpax philpax added this to the 0.2 milestone Jul 13, 2023
@KerfuffleV2 KerfuffleV2 changed the title Add optional llm_samplers sampler backend Use llm_samplers crate for sampler backend Jul 22, 2023
@KerfuffleV2
Copy link
Contributor Author

Unfortunately this has taken a lot longer than expected. The changes on the llm-samplers side are basically complete as of today: KerfuffleV2/llm-samplers#3

I still have some testing and cleanup work to do and then I need to port (or maybe just reimplement) this pull and fix the merge conflicts. I doubt I'll finish all that tomorrow, but possibly Monday. Anyway, I haven't forgotten about this pull even though there hasn't been any visible progress until now.

@philpax
Copy link
Collaborator

philpax commented Jul 23, 2023

No worries, take your time. Appreciate the hard work!

@KerfuffleV2 KerfuffleV2 force-pushed the feat-modular-samplers branch from 95fbf90 to 0b9ac6b Compare July 24, 2023 09:08
@KerfuffleV2
Copy link
Contributor Author

Here's a question: Currently everything else in InferenceParameters got moved to other structures and the only thing remaining is the sampler, which is a little weird.

Depending on whether it's expected that struct would be used for other parameters in the future it could make sense to rename it to something like InferenceSampler. It could also possibly be changed to a type alias for Arc<dyn llm_samplers::Sampler>. Maybe this is something that should be dealt with in a different pull but I just thought I'd mention it.

Note, this pull has the merge conflicts resolved but hasn't been updated to the new version of llm-samplers so it shouldn't be merged yet.

@LLukas22
Copy link
Contributor

The removal of the parameters in the InferenceParameters is on me. I had to move them to the SessionConfig to get graph-planning and cuda-acceleration working. We can probably remove it and simply pass the sampler directly to the inference call.

@KerfuffleV2
Copy link
Contributor Author

KerfuffleV2 commented Jul 24, 2023

What do we think of this general approach?

Note: There's still a considerable amount of cleanup that needs to be done before it should be merged. Right now, running inference does seem to work.

I tried to hide as much of the implementation details in samplers.rs as I could.

One thing that makes things complex is llm may want to have its own defaults for samplers that differ from what llm-samplers uses. That's what the new_ methods in SamplerSettings edit: renamed to ConfiguredSamplers are about.

The general description of how it works is clap's parsing stuff is used to parse a string in a sampler, starting with llm's idea of what the default should be. The result is a list of the samplers that were configured by the user via commandline arguments. Based on that, we build the sampler chain. There's also complexity involved there because depending on what the settings are, the shape of the chain differs: samplers like locally typical can't be used with mirostat, for example.

Anyway, if the consensus is that this approach is acceptable I'll continue with cleaning it up and getting it mergeable. Otherwise I guess we'll see!

edit: This is now reasonably clean.

@KerfuffleV2
Copy link
Contributor Author

One thing I'm not so sure about is how stuff should look from an API standpoint or what exactly the best location for some structure is. Assuming the general approach is okay, it might make sense to merge this and then have someone who actually has a vision for how that stuff is supposed to fit together just rename/reorganize stuff.

Not that I'm unwilling to make that kind of change if requested, I just don't really have the vision/familiarity to necessarily make the optimal choices for that kind of stuff.

@philpax
Copy link
Collaborator

philpax commented Jul 25, 2023

Great work, I'll review this in the next day or two and get back to you

Copy link
Collaborator

@philpax philpax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but ConfiguredSamplers being many Options is a little odd. Would it be possible to use a type-erased array instead?

@KerfuffleV2
Copy link
Contributor Author

Thanks for the review.

but ConfiguredSamplers being many Options is a little odd. Would it be possible to use a type-erased array instead?

Unfortunately not, at least with the approach I'm using. If it's not clear, that's a temporary structure used to hold the configured samplers before they're built into a chain (which is the type-erased array, basically or you can just think of it as dyn Sampler).

This is how it works currently:

  1. Via Clap's argument parsing, an actual sampler is built from a string description (the commandline argument). The reason to do it that way is building the sampler can fail if the options are wrong, so I wanted to let Clap's argument error handling stuff deal with that. Since the commandline argument can be specified multiple times, this produces a Vec<ConfiguredSampler>
  2. The Vec<ConfiguredSampler> gets built into the ConfiguredSamplers struct (which just has an option field for each supported sampler type). The reason we can't use this directly at the option parsing part is Clap doesn't provide (as far as I can see) a way to fold options, you can only set or append. This part also cannot be type erased because the samplers have to be built into a chain in a specific order.
  3. ConfiguredSamplers gets built into a chain (the type provided by llm_samplers. But the order of the chain matters, and also the samplers that can be used depend on some things. By "some things", right now I mainly mean whether Mirostat samplers are used or not. If you want to see the way the chains need to be constructed (or at least how llama.cpp does it, I didn't try to mess with it) you can look here: https://docs.rs/llm-samplers/0.0.4/llm_samplers/index.html#suggested-chainsordering

Also if it's not clear, those first two structs are only used very temporarily during initialization and results in an actual dyn Sampler (which is a SamplerChain) which is used for the actual sampling. On thing that might make it more palatable is moving those ConfiguredSampler/ConfiguredSamplers types into the CLI crate. Right now, they have to be public because CLI handling needs them but the samplers module is in a different crate.

(BTW, don't merge this yet even if you're okay with it after this explanation. I still have a bit more cleanup to do and I'd want to release a version of llm-samplers so Cargo.toml can point at a normal packages instead of a tag in the repo.)
5.

@philpax
Copy link
Collaborator

philpax commented Jul 27, 2023

Gotcha, I understand. Looking forward to it!

@KerfuffleV2
Copy link
Contributor Author

From that response, I'm assuming you didn't require any additional changes like moving stuff around or renaming it. This latest set of changes is mainly just cleanups and improvements to the commandline help text. (I added more information, sorted it alphabetically, etc.)

This should be ready to merge now (I'm assuming the automated checks will pass). Just want to add again though that correct behavior for the samplers hasn't really been tested extensively (and I don't really have the information to know exactly what would be "correct"). If you find anything not working as expected, please let me know.

@KerfuffleV2 KerfuffleV2 force-pushed the feat-modular-samplers branch from 194f671 to ec9052e Compare July 28, 2023 22:16
@LLukas22 LLukas22 requested a review from philpax July 29, 2023 08:21
@KerfuffleV2 KerfuffleV2 marked this pull request as draft August 2, 2023 00:03
@KerfuffleV2
Copy link
Contributor Author

Not sure what was up with this pull since unless I misunderstood there weren't any further changes required.

However, in the meantime I think I came up with a better approach to the ConfigurableSampler trait stuff that will allow a nicer implementation for consumers like llm if things work out how I plan (which certainly isn't always the case). I'll know in a couple days, hopefully.

I've also added another sampler that I had an idea for. I'll just paste the rustdoc description:


Sequence Repetition

This sampler penalizing repeating sequences of tokens that have already been seen within the
last_n window. It is fairly complicated, so here is an example. Suppose we have generated
this sequence of tokens: 1, 2, 3, 4, 1, 2, 3

Disregarding tolerance and max_merge for now, if min_length is 3, then
4 would be the token ID selected to penalize here. This is because the last
tokens are 1, 2, 3 and if we generate a 4 then we'll have created a sequence that
already exists: 1, 2, 3, 4.

If tolerance was 1 and the sequence was 1, 8, 3, 4, 1, 2, 3
we would still penalize 4 since the 8 gets "tolerated". If we also set max_merge=2 and
the sequence was 1, 7, 8, 3, 4, 1, 2, 3 it would still be count as a match and 4 would
be penalized.

Warning: Very alpha code, likely has significant bugs.

Properties:

  • Modifies logits

Parameters:

  • last_n: Number of last tokens to consider. (default: 64)
  • min_length: The minimum length for a sequence to match. (default: 0)
  • flat_penalty: Flat penalty to apply to the token that would continue the matched sequence. (default: 0`)
  • stacking_penalty: Stacking penalty to the token that would continue the matched sequence,
    it is multiplied by the sequence length. (default: 0.0)
  • tolerance: Tolerance basically acts like a wildcard to allow fuzzy sequence matching.
    For example, if tolerance is set to 1, then 1, 6, 3 could match with 1, 2, 3. (default: 0)
  • max_merge: Controls the number of consecutive non-matching tokens that
    the tolerance wildcard can match. Setting this to 0 or 1 deactivates it.
    Setting it to 2 would allow 1, 6, 6, 3 to match with 1, 2, 3. (default: 1)

I'm not sure if you'd want to include this as well, but I can if there's interest. I'm almost positive the matching algorithm has a few bugs but it may still work well enough to be useful (seems to in my tests). This is an approach I came up with myself so there isn't a reference implementation.

The advantage over stuff like the existing repetition and frequency/presence samplers is that they don't take context into account so you can stop specific tokens from getting repeated but this is a pretty blunt instrument. Also, penalizing common short words like "if", "and", "or" because they get repeated frequently can cause some strange effects.

@LLukas22
Copy link
Contributor

LLukas22 commented Aug 3, 2023

Not sure what was up with this pull since unless I misunderstood there weren't any further changes required.

I guess @philpax is currently a bit bussy and working on the GGUF standard and i don't feel comfortable merging this, so i guess we have to wait a bit. And this PR is still marked as a draft 😅

I'm not sure if you'd want to include this as well, but I can if there's interest. I'm almost positive the matching algorithm has a few bugs but it may still work well enough to be useful (seems to in my tests). This is an approach I came up with myself so there isn't a reference implementation.

I think we can include it, as its an optional sampler. But maybe we don't have to pass all parameters to the cli and just keep it for other people to use in their projects if they want to 🤔

@philpax
Copy link
Collaborator

philpax commented Aug 3, 2023

Hey! Sorry - as Lukas said, I've been really busy and have only merging maintenance PRs. What you've described sounds great and I've no problems with it. I can try getting this in on the weekend, or I can wait for you to play with that. Just let me know which one works best for you.

@KerfuffleV2
Copy link
Contributor Author

What you've described sounds great and I've no problems with it. I can try getting this in on the weekend

This weekend would be fine, I think I can have my changes ready by early Sunday (or hopefully sooner).

I think we can include it, as its an optional sampler. But maybe we don't have to pass all parameters to the cli and just keep it for other people to use in their projects if they want to

Well, I think this would mainly be a question of whether it should be in the commandline options (or default set of samplers, but at least with approach I took that's essentially the same thing). A sampler can be a chain of samplers from llm-samplers, so basically llm doesn't really have to explicitly support any particular sampler if someone is setting the sampler field in the inference parameters or whatever. It just has to conform to the correct trait.

Also, I'm not sure if supporting the sampler via CLI but leaving out some of the options it supports would really be that useful. For some samplers, like Mirostat there are a few options that it's very unlikely people would want to change directly. All the options in Sequence Repetition are likely to be things people would want to tweak. The length of matches, tolerance to partial matches, the window which watching occurs. I know I'd use all those.

I'd actually really like to figure out a way to allow specifying multiple samplers of the same kind with llm (llm-samplers can easily do that via its API) because I think having multiple instances of some samplers could be extremely useful. Sequence Repetition in particular: I'd want to have a SR sample with a fairly short length and window and then another one with a really large window and pretty long sequence length (7-10 tokens). I've found it's pretty common for models to get stuck where they'll repeat paragraphs of text from the prompt or previous generation when you're generating long output in the 2,000-4,000 token range (and with the RoPE stuff, generating 8,000+ tokens starting to get practical).

Anyway, for this part (just the SR sampler) I could still use some direction since it's not completely clear what is desired. In a perfect world, I'll also get time to fix all potential issues there and be satisfied it will work pretty reliably but that's definitely not guaranteed to happen before the weekend.

@KerfuffleV2 KerfuffleV2 marked this pull request as ready for review August 6, 2023 12:41
@KerfuffleV2
Copy link
Contributor Author

Okay, I think we're finally there. I was hoping to have this ready early today but these were significant and complex changes on the llm-samplers side. I think it was worth it though and it looks a lot nicer for consumers of the crate as well.

I had to make creating InferenceParameters fallible since sampler construction can fail at that point.

It's possible to specify -s multiple times or combine setting multiple samplers at the same time like -s repetition:last_n=64:penalty=1.2/topk:k=30/locallytypical:p=0.3/topp:p=0.95/mirostat1 (spaces also work as a separator). _ and - are ignored in sampler names so people can write top-p or top_p instead of topp.

It's also possible to add more than one instance of samplers where that makes sense (seqrepetition, repetition, and freqpresence currently).

You'll get a reasonable error if you specify incompatible sampler settings like Locally Typical + Mirostat:

✓ Loaded 291 tensors (3.8 GB) after 87ms
Error: 
   0: Invalid sampler configuration: Cannot enable top-p, top-k, locally typical or tail free samplers with Mirostat 1 or 2

Note: It's possible to use the Sequence Repetition sampler but it doesn't currently appear in the help text. It wasn't clear what was desired regarding that. I can add it to the help text if you want, otherwise it's basically a hidden option that people who know about it can try out.

Copy link
Collaborator

@philpax philpax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work, I'm a big fan of the approach. Ready to merge, just have a few things I'd like changed:

  • re-export llm-samplers or its contents somewhere, so that library users can construct sampler chains / name the relevant types without an additional dependency on llm-samplers
  • document what the default sampler chain does
  • consider exporting ConfiguredSamplers and/or renaming build_sampler to indicate that it is specifically parsing a list of strings to construct a sampler chain from ConfiguredSamplers. At present, it's not clear that there are non-string ways to construct a sampler chain, and build_sampler looks like the only way.
    • Another thought that comes to mind is to move all of the parsing logic (including build_sampler) into llm-cli - as there are currently CLI-specific concerns (like the bias) in there, but library users might want to use the same parsing logic. Maybe remove build_sampler entirely, inline its logic into the CLI (but keep ConfiguredSamplers public), and require that ConfiguredSamplers is constructed with n_vocab to handle the Mirostat case.
    • It may also be prudent to define a thiserror error type for ConfiguredSamplers, so that library users can handle and report construction errors as they please.

Otherwise, I'm really happy with this and l'm looking forward to merging it. Having all of these different samplers and being able to combine them will be awesome!

Expose the sampler configuration structures to allow more flexibility.

Add more documentation and description for the sampling functions and structures.

Create specific enums for sampler construction and sampling errors.

Set n_vocab for the Mirostat 1 sampler in a more reliable way.
@KerfuffleV2
Copy link
Contributor Author

I think these changes should address most (hopefully all!) of your concerns.

I went the route of just exposing the ConfiguredSamplers struct with a warning that one generally shouldn't manually construct it. I also documentation about the default chain but... It's complicated and depends on a couple conditions so it's hard to explain clearly without getting overwhelmingly long.

@philpax
Copy link
Collaborator

philpax commented Aug 6, 2023

Fantastic, thank you. Yeah, that pretty much covers it all - I'm still not sure about the name of build_sampler, but I'll ruminate on it and update it in main (probably as part of #221).

Great work on putting all of this together, it's much appreciated.

@philpax philpax merged commit c3b868a into rustformers:main Aug 6, 2023
@KerfuffleV2
Copy link
Contributor Author

I'm still not sure about the name of build_sampler

I'm not the best at naming things, and also I don't really have good familiarity with the llm API/conventions/etc. So unfortunately there's probably a lot of room for improvement both in the names and how stuff is organized. Absolutely feel free to rename/move everything around as you feel appropriate (not that you need to ask for permission).

Or if you want to give me specific directions for where stuff should be/how it should be named I can also do that along with necessary minor refactoring like imports and submit another pull. (Not really sure if that's easier than doing it yourself, but thought I'd offer.)

@philpax
Copy link
Collaborator

philpax commented Aug 6, 2023

Yeah, no stress at all - I'd like to give the whole API a holistic makeover because it's desperately in need of a cleanup, anyway. I'll let you know if I need a hand or need something clarified, but from what I've seen so far, I shouldn't have any problem figuring it out.

@hhamud hhamud mentioned this pull request Aug 7, 2023
@KerfuffleV2
Copy link
Contributor Author

Now that this has been merged for a while, any issues/feedback? Doesn't seem like there were any actual issues created related to bugs or problems with the samplers.

I created a discussion about the sequence repetition sampler over here: ggerganov/llama.cpp#2581

I still haven't really figured out a really good approach for it. Any thoughts on that would also be welcome (I can create a discussion or issue in this repo also).

@philpax
Copy link
Collaborator

philpax commented Aug 13, 2023

It's been pretty quiet user-wise and I haven't had much opportunity to work on llm lately, but I recently updated llmcord to use it and everything seems to work fine. I'll let you know/make an issue if anything comes up with the samplers themselves, but everything's looking good right now.

I'd be quite interested in the SRS (I've seen the problem it aims to solve quite a few times now in conversational contexts), but I'm also not sure what the best way to solve it is. I'll think about it and post to that discussion if I think of anything.

@LLukas22
Copy link
Contributor

@KerfuffleV2 i had some time to play around with the samplers but creating a custom sampler chain is kinda awkward.

The easiest way to get it working with rustformers was to build a sampler chain from a string which results in this atrocity:

 // Yup, this is awful. But it works for now.
        let sampler_string = format!("repetition:last_n={last_n}:penalty={penalty}/topk:k={top_k}/topp:p={top_p}/temperature:temperature={temperature}",
            last_n = self.repetition_penalty_last_n,
            penalty = self.repetition_penalty,
            top_k = self.top_k,
            top_p = self.top_p,
            temperature = self.temperature
        );

        let sampler_config = &[sampler_string];

        let sampler = llm_base::samplers::build_sampler(0,Default::default(),sampler_config).unwrap();

Whats the intended way to build a simple top_k, top_p, temperature sampler chain?

@KerfuffleV2
Copy link
Contributor Author

@LLukas22

Were the examples in the docs not the kind of thing you're looking for? https://docs.rs/llm-samplers/ — the main doc page there also has suggested ordering/combinations at the bottom based on how llama.cpp does it.

There are also some examples in src/tests.rs: https://github.com/KerfuffleV2/llm-samplers/blob/main/src/tests.rs

Here's how I'd write your chain explicitly:

  let mut sc = SamplerChain::<u32, f32>::new()
        + SampleRepetition::new(self.repetition_penalty, self.repetition_penalty_last_n)
        + SampleTopK::new(self.top_k, 1)
        + SampleTopP::new(self.top_p, 1)
        + SampleTemperature::new(self.temperature)
        + SampleRandDistrib::new();

(Note, not tested: you may need to add a few type annotations.)

Rather than using new() you can also get a default instance and then modify just the fields you want to change like SampleTopK::default().k(self.top_k) - a bit longer, but may make the code more clear. SamplerChain also implements += so you can build it incrementally as well.

Please let me know if that didn't answer your question or you still don't like this approach.

@LLukas22
Copy link
Contributor

@KerfuffleV2 Sometimes i'm baffled by my own stupidity.😅

Well, your examples were exactly what i was looking for, maybe we should include some commonly used setups as preconfigured samplers into rustformers 🤔

@KerfuffleV2
Copy link
Contributor Author

Sometimes i'm baffled by my own stupidity.

Haha, no need to say anything like that. There wasn't really much documentation in the pull about using llm-samplers specifically (guess I expected people to just go look at the crate). For llm-samplers, it's definitely not perfect but there is a decent amount of information in the docs as well as examples. (Feedback on where that falls short/is unclear is definitely welcome.)

maybe we should include some commonly used setups as preconfigured samplers into rustformers

Or even in llm-samplers. I'd be open to adding stuff like that. SamplerChain is a sampler, so you can put chains in other chains as well. Should make it pretty easy to make some common components that can be mixed and matched.

Should probably just mention: I'm very likely to make breaking changes in llm-samplers which is why llm uses it pinned to a specific version. I plan to try to keep llm updated when I do that sort of stuff. The way chains work probably won't change too much, but I really hate how sampler resources work currently. Hardcoding specific resources in a trait is pretty nasty, and to do stuff like support CFG would probably require adding another accessor to the trait.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue:enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modular sampler
3 participants