Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow conversion of Llama / Mistral HF models #6144

Merged
merged 8 commits into from
Mar 29, 2024
63 changes: 59 additions & 4 deletions convert-hf-to-gguf.py
Copy link
Collaborator

@cebtenzzre cebtenzzre Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use:

@Model.register("LlamaForCausalLM", "MistralForCausalLM", "MixtralForCausalLM")

on the existing MixtralModel (call it LlamaModel)? I don't see a point in supporting Mistral in this script without supporting Llama, and these classes are identical so they can be merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that. I tried to mimic what I saw in the existing entries.

In my opinion, it could be interesting to use different architectures for Mistral (and Mixtral) for informative purposes. If you click on the GGUF information for any Mistral file on the Hugging Face Hub, there's nothing that refers to mistral except the filename. But that would also require additional changes, so happy to use the same entry for the three of them!

Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
sys.path.insert(1, str(Path(__file__).parent / 'gguf-py'))
import gguf

from convert import HfVocab
from convert import HfVocab, permute


###### MODEL DEFINITIONS ######
Expand Down Expand Up @@ -1043,12 +1043,67 @@ def set_gguf_parameters(self):
self.gguf_writer.add_layer_norm_eps(self.find_hparam(["layer_norm_eps", "norm_eps"]))


@Model.register("MixtralForCausalLM")
class MixtralModel(Model):
@Model.register("LlamaForCausalLM", "MistralForCausalLM", "MixtralForCausalLM")
class LlamaModel(Model):
model_arch = gguf.MODEL_ARCH.LLAMA

def set_vocab(self):
self._set_vocab_sentencepiece()
self._set_vocab_hf()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing conversion with fine-tuned models I experienced this problem: #6320. If that PR is merged, then we can also use _set_vocab_sentencepiece() here.


def set_gguf_parameters(self):
super().set_gguf_parameters()
hparams = self.hparams
self.gguf_writer.add_block_count(hparams["num_hidden_layers"])
self.gguf_writer.add_rope_dimension_count(hparams["hidden_size"] // hparams["num_attention_heads"])

# Same as super class, but permuting q_proj, k_proj
def write_tensors(self):
block_count = self.hparams.get("n_layers", self.hparams.get("num_hidden_layers", self.hparams.get("n_layer")))
tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)
n_head = self.hparams.get("num_attention_heads")
n_kv_head = self.hparams.get("num_key_value_heads")
for name, data_torch in self.get_tensors():
# we don't need these
if name.endswith((".attention.masked_bias", ".attention.bias", ".attention.rotary_emb.inv_freq")):
continue

old_dtype = data_torch.dtype

# convert any unsupported data types to float32
if data_torch.dtype not in (torch.float16, torch.float32):
data_torch = data_torch.to(torch.float32)

if name.endswith(("q_proj.weight")):
data_torch = permute(data_torch, n_head, n_head)
if name.endswith(("k_proj.weight")):
data_torch = permute(data_torch, n_head, n_kv_head)

data = data_torch.squeeze().numpy()

# map tensor names
new_name = tensor_map.get_name(name, try_suffixes=(".weight", ".bias"))
if new_name is None:
print(f"Can not map tensor {name!r}")
sys.exit()

n_dims = len(data.shape)
data_dtype = data.dtype

# if f32 desired, convert any float16 to float32
if self.ftype == 0 and data_dtype == np.float16:
data = data.astype(np.float32)

# TODO: Why cant we use these float16 as-is? There should be not reason to store float16 as float32
if self.ftype == 1 and data_dtype == np.float16 and n_dims == 1:
data = data.astype(np.float32)

# if f16 desired, convert any float32 2-dim weight tensors to float16
if self.ftype == 1 and data_dtype == np.float32 and name.endswith(".weight") and n_dims == 2:
data = data.astype(np.float16)

print(f"{new_name}, n_dims = {n_dims}, {old_dtype} --> {data.dtype}")

self.gguf_writer.add_tensor(new_name, data)


@Model.register("MiniCPMForCausalLM")
Expand Down
Loading