-
Notifications
You must be signed in to change notification settings - Fork 41
I found a way how to use these models directly with Text Generation WebUI #24
Comments
Nice, if you want feel free to update the readme on a PR. I can also add / commit your instructions directly when I get time, whatever works better |
@paolorechia Your repo is exactly what I was looking for! Thank you for your effort! |
The newer models, specifically designed to be more conversational, are inconsistent for my particular use case. I'm working on creating a user-guided app that generates technical content using third-party APIs. My goal: obtain structured JSON output from the models. |
I may be completely wrong, but if I were you I would consider generating some JSON outputs using openai api based on your requests and fine tune vicuña or wizard 7B. Otherwise, maybe you don’t even have to use some detailed output but rather make the model generate proper JSON when it’s being asked to do that. Hopefully some existing JSON data could suffice to achieve that. |
@GDrupal interesting undertaking!
|
Sorry, @GDrupal , just re-read your original comment on my desktop and noticed you did mention prompting the models. The Vicuna 1.1 is also pretty garbage when it comes generating Python code, it's full of syntax errors. I get much better results from WizardLM 7b unquantized, so far the best to use as langchain agent with access to Python REPL from the models I tried (also tried Vicuna 1.1 both 7b/13b and stable-vicuna). On the topic of training a "soldier", I'm planning on fine-tuning a LoRA to perform these actions. Here's my plan:
No idea whether it will work - I'm excited to try it out and see what happens :) |
Updated documentation with a link to this issue. Thanks again |
From the README
"If you try an unsupported model, you'll see "gibberish output".
This happens for instance with https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g
If you know how to use these models directly with Text Generation WebUI please share your expertise :)"
I Managed to get this working on my local on Linux. with
https://huggingface.co/4bit/vicuna-13B-1.1-GPTQ-4bit-128g
https://huggingface.co/TheBloke/wizard-vicuna-13B-GPTQ
https://huggingface.co/4bit/gpt4-x-alpaca-13b-native-4bit-128g-cuda
https://huggingface.co/4bit/stable-vicuna-13B-GPTQ
If that helps, my setup:
eb3be97 ("I noticed a slowdown. Revert the code.", 2023-04-26)
load with:
python server.py --model vicuna-13B-1.1-GPTQ-4bit-128g --wbits 4 --groupsize 128 --model_type Llama --api
Currently running models in a nvidia A2000 and consuming from langchain just using api endpoint... but simple stuff no agents. Alpaca and vicuna 1.1 are the best ones for me so far.
I was about to try to use embeddings and found your repo... great work!
Trying to understand how did you managed to get embeddings working xD.
The text was updated successfully, but these errors were encountered: