How to load ckpt files generated by`torchtune.utils.FullModelHFCheckpointer` into hf models #878

BMPixel · 2024-04-26T08:09:17Z

When using torchtune.utils.FullModelHFCheckpointer to load huggingface models, it reads *.safetensor files, while it instead outputs *.pt as ckpt files. The *.pt can not be load with from_pretrained function.

Is there a way to convert *.pt ckpt files into something like pytorch_model.bin or *.safetensors?

This issue is similar to #832 , which seem focus on converting meta ckpt files like consolidated.xx.pth. I am wondering will it be good to have a cli tool to convert ckpts between meta, pytorch and huggingface formats? That will be helpful.

The text was updated successfully, but these errors were encountered:

BMPixel · 2024-04-26T08:50:14Z

I've figure it out. I post my understanding here if anyone has same question.

First of all, *bin and *.pt are the same format of files which can be processed with torch.load/save. And *.safetensor is just another format used by huggingface. They all contain state_dicts of model.

torchtune.utils.FullModelHFCheckpointer create pt files that are corresponding to the safetensor files, but in different format. However huggingface can read these file with no difference, so the simplest solution to from_pretrained your ckpts is to modify the model.safetensors.index.json like this:

{
  "metadata": {
    "total_size": 32121044992
  },
  "weight_map": {
    "lm_head.weight": "hf_model_0007_0.pt",
    "model.embed_tokens.weight": "hf_model_0001_0.pt",
    "model.layers.0.input_layernorm.weight": "hf_model_0001_0.pt"
...

from_pretrained will look for the model.safetensors.index.json and read all ckpts. Furthermore, if you want to create a pytorch_model.bin out of *.pt files, you can just merge all dicts from *.pt files and save it as pytorch_model.bin

pt_to_merge = glob.glob("outputs/trial_one/hf_model_000*_1.pt")
state_dicts = [torch.load(p) for p in tqdm(pt_to_merge)]
merged_state_dicts = {k: v for d in state_dicts for k, v in d.items()}
torch.save(merged_state_dicts, "pytorch_model.bin"

I hope these help!

kartikayk · 2024-04-26T19:29:54Z

@BMPixel absolutely spot on! Thanks so much for the detailed comment on this - all of this makes sense to me.

You can simply rename the .pt files to .bin and things should work. .pt is a more common pytorch extension and so that's what we output. Great point about modifying model.safetensors.index.json!

Did you need any other changes? Or is the json change worked?

BMPixel closed this as completed May 10, 2024

SalmanMohammadi mentioned this issue Jun 7, 2024

HFCheckpointer saves to .pt instead of .safetensors #1068

Closed

fabriceyhc mentioned this issue Jun 10, 2024

Llama-3 Inference and Uploading to Huggingface #931

Closed

This was referenced Aug 21, 2024

Add back in lm_head.weight in Qwen2 after training #1381

Closed

Streamline/better documentation for torchtune -> transformers workflow #1388

Open

yuzhenmao mentioned this issue Dec 6, 2024

How to convert fine-tuned .pt to huggingface .safetensors #2118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load ckpt files generated by`torchtune.utils.FullModelHFCheckpointer` into hf models #878

How to load ckpt files generated by`torchtune.utils.FullModelHFCheckpointer` into hf models #878

BMPixel commented Apr 26, 2024 •

edited

Loading

BMPixel commented Apr 26, 2024

kartikayk commented Apr 26, 2024

How to load ckpt files generated bytorchtune.utils.FullModelHFCheckpointer into hf models #878

How to load ckpt files generated bytorchtune.utils.FullModelHFCheckpointer into hf models #878

Comments

BMPixel commented Apr 26, 2024 • edited Loading

BMPixel commented Apr 26, 2024

kartikayk commented Apr 26, 2024

How to load ckpt files generated by`torchtune.utils.FullModelHFCheckpointer` into hf models #878

How to load ckpt files generated by`torchtune.utils.FullModelHFCheckpointer` into hf models #878

BMPixel commented Apr 26, 2024 •

edited

Loading