-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to save a trained model so it can be loaded with HF from_pretrained()
?
#832
Comments
After some initial research, the functionality I'm imagining is already implemented for Llama 2 in the I opened an issue to add support for Llama 3 to that function. I successfully converted a Llama 2 model saved with the meta checkpointer to a format that can be loaded with |
@calmitchell617 thanks for opening up this issue! Actually, we do support HF formats directly in torchtune by using this function. More details here. I believe this should work OOTB for the safetensors files available in the llama3 repo, but I can confirm in a bit with you. Generally, we've designed torchtune to be state-dict invariant. So the format of the checkpoint in the input is the format we write to. The reason was exactly what you mentioned above i.e. better interop with other tools in the ecosystem. let me know if this helps. |
The point about lora adapters is a great one! I had an offline chat with @BenjaminBossan about this at some point. Let me follow up on this and see if we can build more interop here with peft. |
@kartikayk, thanks for the quick follow up. A peft integration would be very handy, but the other thing is more pressing for me right now. I did come across that function you mentioned, but am still not sure how to accomplish my use case. Maybe I am not understanding an existing way of doing things. Instead of presecribing a solution, let me try to be really specific with my use case. Hopefully I'm just making things harder than they need to be, and you will have an easy solution. I want to:
Two issues I see with step 3 of that list are:
Again, thank you for your fast responses. Hopefully I'm just not seeing an existing solution. |
@calmitchell617 I just tried what I had mentioned and realized the folly of what I said above :) So I was under the impression that the "safetensors" files are compatible with |
@calmitchell617 ok I think I can convert the weights around correctly using the following code:
But as I was looking into the HF code for this, I realized there might be additional piping needed here to actually get this up and running with |
Great! I will play around with the functions you mentioned to see if it is possible. If not, I will keep an eye on Huggingface's PR to see if that functionality can be applied, even if it is just an example script provided in a doc somewhere. You may already be aware that loading a model with |
@calmitchell617 this is great feedback! Let me take a closer look at this. Do you think we'll need to do something different than creating hf-format checkpoints (once these are available)? For llama2, I think this works OOTB. We did verify interop with llama.cpp for example. But let me take a closer look at the inference support within HF as well - again thanks so much for the feedback on this! |
I do think an extra processing step is required. Here are exactly the steps I took to download and fine-tune Llama 2 with torchtune, then process it to load successfully with Download Llama 2 in a non HF format via torchtune cli:
Fine tune Llama 2 with torchtune, again using the cli:
Convert the fine-tuned model to a format that can be loaded with Last, test the conversion by running a Gradio chatbot on the fine-tuned/converted model. It worked as expected. As mentioned before, the A crucial thing to note is that I have not able to load a checkpoint saved with torchtune's hf checkpointer with |
So, the problem is already solved for Llama 2, but of course, everyone wants Llama 3 :-) There may be some pre-release code in the Huggingface repo that adds Llama 3 support to If that goes well, I will post a full reproducible example. From there, we can see if it's worth including the script as an example, baking into a helper function, or even into the checkpointer as an option. |
This is awesome info! I'd love to discuss baking this into the checkpointer directly, since that was the intent of the HF checkpointer to begin with :) |
Great. This issue is important to me so I will work on it tomorrow while keeping the goal of having it work OOTB with the checkpointer in mind. Happy to discuss via email or video call anytime. Thanks again for your attention on this. |
Actually, I'd love to do a quick call on this and figure it out! Mind sharing your email or pinging me on discord (@ KK on the discord channel) so we can set this up? I really appreciate all of the effort in figuring this out - really awesome! |
@calmitchell617 I took a look at the code pointer above, and it seems like there's just a little bit of json wrangling needed to make this work. The model checkpoint itself doesn't need any changes. Let me know if you disagree? |
In addition to JSON wrangling, I believe HF's model conversion code is:
I tested the code in HF's PR to add Llama 3 support to their model conversion function. After a few small alterations, it worked fine. So, it seems like we at least have a blueprint to follow to include that functionality in this repo. |
Noob Question: For converting the |
@apthagowda97, you will most likely need to convert to HF first, as that is probably the format whatever tool you're using to convert is expecting. |
So I do think llama.cpp's convert script supports the meta format (I've done this many times for llama2 for example). Here's the code: The caveat is that they explicitly check for The only question I'm not sure about is if they support tiktoken or not? You can give it a whirl. Just make sure you have the tokenizer model file in the same folder. |
@kartikayk, following our discussion, here is a full reproducible series of steps that I took to download and convert Llama 3 to a HF format that can be read by @apthagowda97, would you give it a try and let me know if it works for your use case? Prerequisites:
Empty dir
torchtune installed
transformers checked out to correct branch (and installed)
Downloaded Llama 3 8B Instruct:
Convert to HF formatPut the code in this gist (mostly copy/pasted from this PR with a few changes to how paths are handled) in a file called Command:
Now the model should workNow you should be able to load the model with
And run the file to show that we can now load the model with
|
@calmitchell617 this is AWESOME! Thanks so much for the detailed instructions. |
@calmitchell617 It works !!.. now the PR is merged we can use directly. @kartikayk I tried directly converting from Meta to GGUF like you suggested by placing the tokenizer... but the answer quality is bad ( I am guessing something related to tokenizer not working properly ) |
It's not working for 70B model :/ |
Thanks! It works for the Llama3 8b instruct conversion. Saving a LlamaTokenizerFast to /Users/models/converted-llama3. |
@hugebeanie, running I observed the process using a peak of 143.2 GB CPU RAM on my machine. So, you will realistically need a few more GB than that. One thing you can do is increase the swap memory of your system. I have done this before, you just need to Google (or ask an LLM) how to increase swap on your system. This will slow things down, but as long as you have a fast SSD, it should finish in a somewhat reasonable time. |
@monk1337, what part are you have an issue with? I just ran the steps above without any issue for the 70B model. If you're having trouble running torchtune on the 70B model, you just have to alter the configs a bit to get it to work. I have done this, and can provide an example if that's your issue. |
Hello, did anyone here manage to convert the finetuned ckpts into something that can be loaded in HF's |
Also curious here - I want to be able to take the final checkpoints from finetuning and run inference on them using |
Hi folks, to follow up on this: we do now support integration with PEFT as of #933 (thanks to @BenjaminBossan for the help reviewing these changes). Whenever running a LoRA fine-tune with our HF checkpointer, we will save adapter weights and config in a format that can be loaded into PEFT. The example usage would be as follows: Run LoRA fine-tune with torchtune CLI:
Load fine-tuned adapter weights into PEFT model with same base model from hub:
To be clear, this is not quite the same as proper HF Hopefully this helps unblock folks using LoRA, we are working to get proper transformers |
I believe this should now be fully addressed after #2074 and would point folks to this section of our end-to-end tutorial. So going to belatedly close this issue |
I'm finding this repo to be a user friendly, extensible, memory efficient solution for training/fine-tuning models. However, when it comes to inference, there is a usability gap that could be solved by converting the model into a format that can be loaded by HF's
from_pretrained()
function.The specific thing I want to do is load a model fine-tuned with torchtune into a Gradio chatbot, complete with token streaming. I imagine many other downstream tasks would be made easier with this functionality as well.
Would it be reasonable to add the following options to the checkpointer?
from_pretrained()
.get_peft_model()
.If this seems like a valid addition, and isn't a huge lift, I would be happy to give it a try.
The text was updated successfully, but these errors were encountered: