How to save a trained model so it can be loaded with HF `from_pretrained()`? #832

calmitchell617 · 2024-04-22T07:56:23Z

I'm finding this repo to be a user friendly, extensible, memory efficient solution for training/fine-tuning models. However, when it comes to inference, there is a usability gap that could be solved by converting the model into a format that can be loaded by HF's from_pretrained() function.

The specific thing I want to do is load a model fine-tuned with torchtune into a Gradio chatbot, complete with token streaming. I imagine many other downstream tasks would be made easier with this functionality as well.

Would it be reasonable to add the following options to the checkpointer?

Save full model weights in such a way that they can be loaded with from_pretrained().
Save a LoRA adapter in such a way that it can be loaded into a model with get_peft_model().

If this seems like a valid addition, and isn't a huge lift, I would be happy to give it a try.

The text was updated successfully, but these errors were encountered:

calmitchell617 · 2024-04-22T11:24:09Z

After some initial research, the functionality I'm imagining is already implemented for Llama 2 in the convert_llama_weights_to_hf() function in HF Transformers.

I opened an issue to add support for Llama 3 to that function.

I successfully converted a Llama 2 model saved with the meta checkpointer to a format that can be loaded with from_pretrained() in that linked issue.

calmitchell617 · 2024-04-22T14:04:00Z

Looks like someone at HF is already working on converting that script to support Llama 3.

kartikayk · 2024-04-22T14:17:32Z

@calmitchell617 thanks for opening up this issue! Actually, we do support HF formats directly in torchtune by using this function. More details here. I believe this should work OOTB for the safetensors files available in the llama3 repo, but I can confirm in a bit with you.

Generally, we've designed torchtune to be state-dict invariant. So the format of the checkpoint in the input is the format we write to. The reason was exactly what you mentioned above i.e. better interop with other tools in the ecosystem. let me know if this helps.

kartikayk · 2024-04-22T14:20:02Z

The point about lora adapters is a great one! I had an offline chat with @BenjaminBossan about this at some point. Let me follow up on this and see if we can build more interop here with peft.

calmitchell617 · 2024-04-22T14:41:40Z

@kartikayk, thanks for the quick follow up. A peft integration would be very handy, but the other thing is more pressing for me right now.

I did come across that function you mentioned, but am still not sure how to accomplish my use case. Maybe I am not understanding an existing way of doing things.

Instead of presecribing a solution, let me try to be really specific with my use case. Hopefully I'm just making things harder than they need to be, and you will have an easy solution. I want to:

Download Llama 3 8B.
Fine tune with torchtune.
Write the resulting model in a format that can be read by from_pretrained().
Run inference with the fine tuned model in a Gradio chat app with from_pretrained().

Two issues I see with step 3 of that list are:

Llama 3 is not officially available in hf format at this time. At least, I don't see it on the model page. Since torchtune always writes back to the same format a model was loaded from, this means we cannot use torchtune to write an hf version of Llama 3 right now.
Even if Llama 3 was available in hf format, the files and structure downloaded by tune download ... are not exactly the same as those created with save_pretrained(). This means when the checkpointer writes output to <checkpoint_dir>, it will not load with from_pretrained() without some processing first (such as with the script mentioned).

Again, thank you for your fast responses. Hopefully I'm just not seeing an existing solution.

kartikayk · 2024-04-22T14:59:42Z

@calmitchell617 I just tried what I had mentioned and realized the folly of what I said above :) So I was under the impression that the "safetensors" files are compatible with from_pretrained, but thats not true like you said. I do think this should be easy to accomplish. Let me quickly try something and share a code snippet with you.

kartikayk · 2024-04-22T15:21:24Z

@calmitchell617 ok I think I can convert the weights around correctly using the following code:

import torch
from torchtune.models import convert_weights

sd = torch.load('Meta-Llama-3-8B/original/consolidated.00.pth', mmap=True, map_location='cpu')
sd_tune = convert_weights.meta_to_tune(sd)
sd_hf = convert_weights.tune_to_hf(sd_tune, num_heads=32, num_kv_heads=8)

But as I was looking into the HF code for this, I realized there might be additional piping needed here to actually get this up and running with from_pretrained. I don't understand that pipeline very well, but let me know if this helps you get started in the right direction, even if its not the full solution.

calmitchell617 · 2024-04-22T15:29:28Z

Great! I will play around with the functions you mentioned to see if it is possible. If not, I will keep an eye on Huggingface's PR to see if that functionality can be applied, even if it is just an example script provided in a doc somewhere.

You may already be aware that loading a model with from_pretrained() is a very popular way to load an NLP model for inference, so supporting that use case wouldn't be wasted.

kartikayk · 2024-04-22T15:34:08Z

@calmitchell617 this is great feedback! Let me take a closer look at this. Do you think we'll need to do something different than creating hf-format checkpoints (once these are available)? For llama2, I think this works OOTB. We did verify interop with llama.cpp for example. But let me take a closer look at the inference support within HF as well - again thanks so much for the feedback on this!

calmitchell617 · 2024-04-22T17:12:41Z

I do think an extra processing step is required. Here are exactly the steps I took to download and fine-tune Llama 2 with torchtune, then process it to load successfully with from_pretrained().

Download Llama 2 in a non HF format via torchtune cli:

tune download meta-llama/Llama-2-7b-chat --output-dir <checkpoint_path> --hf-token $HF_TOKEN

Fine tune Llama 2 with torchtune, again using the cli:

tune run \
    --nproc_per_node=4 \
    lora_finetune_distributed \
    --config llama2/7B_lora \
    batch_size=1 \
    seed=29 \
    tokenizer.path=<checkpoint_path> \
    checkpointer.checkpoint_dir=<checkpoint_path> \
    checkpointer.output_dir=<checkpoint_path> \
    dataset=torchtune.datasets.my_custom_dataset \
    checkpointer.checkpoint_files=['consolidated.00.pth'] \
    checkpointer=torchtune.utils.FullModelMetaCheckpointer \
    gradient_accumulation_steps=1 \
    lr_scheduler.num_warmup_steps=5 \
    enable_activation_checkpointing=False \
    dataset.max_rows=100 \
    epochs=1

Convert the fine-tuned model to a format that can be loaded with from_pretrained() with the convert_llama_weights_to_hf() function. You can simply copy and paste that function into a standalone script and call it.

Last, test the conversion by running a Gradio chatbot on the fine-tuned/converted model. It worked as expected.

As mentioned before, the convert_llama_weights_to_hf() function does not yet support Llama 3, but Huggingface are already working to add support.

A crucial thing to note is that I have not able to load a checkpoint saved with torchtune's hf checkpointer with from_pretrained().

calmitchell617 · 2024-04-22T17:17:32Z

So, the problem is already solved for Llama 2, but of course, everyone wants Llama 3 :-)

There may be some pre-release code in the Huggingface repo that adds Llama 3 support to convert_llama_weights_to_hf(). I will give that a try tomorrow and let you know how it goes here.

If that goes well, I will post a full reproducible example. From there, we can see if it's worth including the script as an example, baking into a helper function, or even into the checkpointer as an option.

kartikayk · 2024-04-22T17:25:44Z

This is awesome info! I'd love to discuss baking this into the checkpointer directly, since that was the intent of the HF checkpointer to begin with :)

calmitchell617 · 2024-04-22T17:34:58Z

Great. This issue is important to me so I will work on it tomorrow while keeping the goal of having it work OOTB with the checkpointer in mind.

Happy to discuss via email or video call anytime. Thanks again for your attention on this.

kartikayk · 2024-04-22T18:37:44Z

Actually, I'd love to do a quick call on this and figure it out! Mind sharing your email or pinging me on discord (@ KK on the discord channel) so we can set this up? I really appreciate all of the effort in figuring this out - really awesome!

kartikayk · 2024-04-23T02:08:22Z

@calmitchell617 I took a look at the code pointer above, and it seems like there's just a little bit of json wrangling needed to make this work. The model checkpoint itself doesn't need any changes. Let me know if you disagree?

calmitchell617 · 2024-04-23T08:21:20Z

The model checkpoint itself doesn't need any changes. Let me know if you disagree?

In addition to JSON wrangling, I believe HF's model conversion code is:

Permuting q_proj and k_proj
Adding a key inv_freq to the state dict when the state dict is sharded. I'm not sure if that key is usually present in unsharded state dicts, and just needs to be added?

I tested the code in HF's PR to add Llama 3 support to their model conversion function. After a few small alterations, it worked fine. So, it seems like we at least have a blueprint to follow to include that functionality in this repo.

apthagowda97 · 2024-04-23T10:52:23Z

Noob Question:

For converting the meta_model_0.pt to GGUF.
Do we need to convert to HF and then convert to GGUF (fp16) or can we do it directly from meta_model_0.pt?

calmitchell617 · 2024-04-23T15:31:28Z

@apthagowda97, you will most likely need to convert to HF first, as that is probably the format whatever tool you're using to convert is expecting.

kartikayk · 2024-04-23T15:37:07Z

So I do think llama.cpp's convert script supports the meta format (I've done this many times for llama2 for example). Here's the code:
https://github.com/ggerganov/llama.cpp/blob/master/convert.py#L1364

The caveat is that they explicitly check for consolidated.00.pth and so @apthagowda97 you'll need to rename the checkpoint which isn't too bad.

The only question I'm not sure about is if they support tiktoken or not? You can give it a whirl. Just make sure you have the tokenizer model file in the same folder.

calmitchell617 · 2024-04-23T16:04:12Z

@kartikayk, following our discussion, here is a full reproducible series of steps that I took to download and convert Llama 3 to a HF format that can be read by from_pretrained(). Others may find this example useful as a temporary workaround, as well.

@apthagowda97, would you give it a try and let me know if it works for your use case?

Prerequisites:

Empty directory at ~/models
torchtune installed
transformers cloned and checked out to the commit mentioned above with gh pr checkout 30334

Empty dir

mkdir ~/models

torchtune installed

tune -h

transformers checked out to correct branch (and installed)

cd /tmp
git clone https://github.com/huggingface/transformers.git
cd /tmp/transformers
gh pr checkout 30334
pip install .

Downloaded Llama 3 8B Instruct:

tune download meta-llama/Meta-Llama-3-8B-Instruct --output-dir ~/models/meta-llama/Meta-Llama-3-8B-Instruct --hf-token $HF_TOKEN

Convert to HF format

Put the code in this gist (mostly copy/pasted from this PR with a few changes to how paths are handled) in a file called ~/models/convert-model.py, then run it with the following command:

Command:

python ~/models/convert_model.py --input_dir ~/models/meta-llama/Meta-Llama-3-8B-Instruct/ --output_dir ~/models/converted-llama3  --model_size 8Bf --llama_version 3

Now the model should work

Now you should be able to load the model with from_pretrained(). Create a file ~/models/load_model.py:

from transformers import AutoModelForCausalLM
from torch import bfloat16

model = AutoModelForCausalLM.from_pretrained(
    "converted-llama3",
    torch_dtype=bfloat16,
)

And run the file to show that we can now load the model with from_pretrained():

cd ~/models
python load_model.py

kartikayk · 2024-04-24T02:42:57Z

@calmitchell617 this is AWESOME! Thanks so much for the detailed instructions.

apthagowda97 · 2024-04-25T12:58:26Z

@calmitchell617 It works !!.. now the PR is merged we can use directly.
The correct way to convert Meta to GGUF is (Meta -> HF -> GGUF).

@kartikayk I tried directly converting from Meta to GGUF like you suggested by placing the tokenizer... but the answer quality is bad ( I am guessing something related to tokenizer not working properly )

monk1337 · 2024-05-03T09:59:43Z

It's not working for 70B model :/

hugebeanie · 2024-05-07T04:57:05Z

Thanks! It works for the Llama3 8b instruct conversion.
However when converting the 70b instruct version, I had the following error:
% python ~/models/convert_model.py --input_dir ~/models/meta-llama/Meta-Llama-3-70B-Instruct/ --output_dir ~/models/converted-llama3 --model_size 70B --llama_version 3

Saving a LlamaTokenizerFast to /Users/models/converted-llama3.
{'dim': 8192, 'ffn_dim_multiplier': 1.3, 'multiple_of': 4096, 'n_heads': 64, 'n_kv_heads': 8, 'n_layers': 80, 'norm_eps': 1e-05, 'vocab_size': 128256, 'rope_theta': 500000.0}
Fetching all parameters from the checkpoint at "/Users/models/meta-llama/Meta-Llama-3-70B-Instruct/.
"
zsh: killed python ~/models/convert_model.py --input_dir --output_dir --model_size 70B
is this because I need more memory?

calmitchell617 · 2024-05-07T08:39:11Z

@hugebeanie, running convert_model.py with a 70B param model (saved in bfloat16) will take ~140GB of CPU RAM (not GPU RAM).

I observed the process using a peak of 143.2 GB CPU RAM on my machine. So, you will realistically need a few more GB than that.

One thing you can do is increase the swap memory of your system. I have done this before, you just need to Google (or ask an LLM) how to increase swap on your system. This will slow things down, but as long as you have a fast SSD, it should finish in a somewhat reasonable time.

calmitchell617 · 2024-05-07T08:50:00Z

@monk1337, what part are you have an issue with? I just ran the steps above without any issue for the 70B model.

If you're having trouble running torchtune on the 70B model, you just have to alter the configs a bit to get it to work. I have done this, and can provide an example if that's your issue.

optimass · 2024-05-10T17:44:22Z

Hello, did anyone here manage to convert the finetuned ckpts into something that can be loaded in HF's from_pretrained() ?

tambulkar · 2024-05-22T23:25:59Z

Also curious here - I want to be able to take the final checkpoints from finetuning and run inference on them using from_pretrained()

ebsmothers · 2024-05-24T15:47:47Z

Hi folks, to follow up on this: we do now support integration with PEFT as of #933 (thanks to @BenjaminBossan for the help reviewing these changes). Whenever running a LoRA fine-tune with our HF checkpointer, we will save adapter weights and config in a format that can be loaded into PEFT. The example usage would be as follows:

Run LoRA fine-tune with torchtune CLI:

tune run lora_finetune_single_device --config llama2/7B_lora_single_device \
checkpointer.output_dir=/my/output/dir

Load fine-tuned adapter weights into PEFT model with same base model from hub:

from transformers import AutoModelForCausalLM
from peft import PeftModel

# hub ID of the base model from the above fine-tune
model_id = "meta-llama/Llama-2-7b-hf" 

# output_dir from tune command
checkpoint_dir = "/my/output/dir" 

model = AutoModelForCausalLM.from_pretrained(model_id)
peft_model = PeftModel.from_pretrained(model, checkpoint_dir)

To be clear, this is not quite the same as proper HF from_pretrained support in general (i.e. this is just for loading adapter weights from a LoRA fine-tune with a pretrained model on the hub). One other caveat is that we do not yet support phi-3 in this flow (won't get into the weeds here but there is a comment on #933 explaining why for interested parties).

Hopefully this helps unblock folks using LoRA, we are working to get proper transformers from_pretrained integration as soon as possible!

ebsmothers · 2025-01-13T23:19:13Z

I believe this should now be fully addressed after #2074 and would point folks to this section of our end-to-end tutorial. So going to belatedly close this issue

RdoubleA assigned kartikayk Apr 22, 2024

calmitchell617 mentioned this issue Apr 23, 2024

feat: added packed continued pretraining dataset functionality #827

Closed

man-shar mentioned this issue Apr 26, 2024

Understanding contents of the final checkpoint file #877

Closed

BMPixel mentioned this issue Apr 26, 2024

How to load ckpt files generated bytorchtune.utils.FullModelHFCheckpointer into hf models #878

Closed

fabriceyhc mentioned this issue May 3, 2024

Llama-3 Inference and Uploading to Huggingface #931

Closed

felipemello1 assigned joecummings Jun 28, 2024

felipemello1 added the enhancement New feature or request label Jun 28, 2024

This was referenced Aug 21, 2024

Add back in lm_head.weight in Qwen2 after training #1381

Closed

Streamline/better documentation for torchtune -> transformers workflow #1388

Open

yuzhenmao mentioned this issue Dec 6, 2024

How to convert fine-tuned .pt to huggingface .safetensors #2118

Closed

ebsmothers closed this as completed Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to save a trained model so it can be loaded with HF `from_pretrained()`? #832

How to save a trained model so it can be loaded with HF `from_pretrained()`? #832

calmitchell617 commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024 •

edited

Loading

calmitchell617 commented Apr 22, 2024

kartikayk commented Apr 22, 2024

kartikayk commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024

kartikayk commented Apr 22, 2024

kartikayk commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024

kartikayk commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024

kartikayk commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024 •

edited

Loading

kartikayk commented Apr 22, 2024

kartikayk commented Apr 23, 2024

calmitchell617 commented Apr 23, 2024

apthagowda97 commented Apr 23, 2024 •

edited

Loading

calmitchell617 commented Apr 23, 2024

kartikayk commented Apr 23, 2024

calmitchell617 commented Apr 23, 2024 •

edited

Loading

kartikayk commented Apr 24, 2024

apthagowda97 commented Apr 25, 2024 •

edited

Loading

monk1337 commented May 3, 2024

hugebeanie commented May 7, 2024

calmitchell617 commented May 7, 2024 •

edited

Loading

calmitchell617 commented May 7, 2024

optimass commented May 10, 2024

tambulkar commented May 22, 2024 •

edited

Loading

ebsmothers commented May 24, 2024

ebsmothers commented Jan 13, 2025

How to save a trained model so it can be loaded with HF from_pretrained()? #832

How to save a trained model so it can be loaded with HF from_pretrained()? #832

Comments

calmitchell617 commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024 • edited Loading

calmitchell617 commented Apr 22, 2024

kartikayk commented Apr 22, 2024

kartikayk commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024

kartikayk commented Apr 22, 2024

kartikayk commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024

kartikayk commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024

kartikayk commented Apr 22, 2024

calmitchell617 commented Apr 22, 2024 • edited Loading

kartikayk commented Apr 22, 2024

kartikayk commented Apr 23, 2024

calmitchell617 commented Apr 23, 2024

apthagowda97 commented Apr 23, 2024 • edited Loading

calmitchell617 commented Apr 23, 2024

kartikayk commented Apr 23, 2024

calmitchell617 commented Apr 23, 2024 • edited Loading

Downloaded Llama 3 8B Instruct:

Convert to HF format

Now the model should work

kartikayk commented Apr 24, 2024

apthagowda97 commented Apr 25, 2024 • edited Loading

monk1337 commented May 3, 2024

hugebeanie commented May 7, 2024

calmitchell617 commented May 7, 2024 • edited Loading

calmitchell617 commented May 7, 2024

optimass commented May 10, 2024

tambulkar commented May 22, 2024 • edited Loading

ebsmothers commented May 24, 2024

ebsmothers commented Jan 13, 2025

How to save a trained model so it can be loaded with HF `from_pretrained()`? #832

How to save a trained model so it can be loaded with HF `from_pretrained()`? #832

calmitchell617 commented Apr 22, 2024 •

edited

Loading

calmitchell617 commented Apr 22, 2024 •

edited

Loading

apthagowda97 commented Apr 23, 2024 •

edited

Loading

calmitchell617 commented Apr 23, 2024 •

edited

Loading

apthagowda97 commented Apr 25, 2024 •

edited

Loading

calmitchell617 commented May 7, 2024 •

edited

Loading

tambulkar commented May 22, 2024 •

edited

Loading