Port gpt2 transformers checkpoint #1704

cosmo3769 · 2024-07-20T06:18:17Z

Ported GPT2 transformers checkpoint in kerasNLP. Please check. Thank you!

cosmo3769 · 2024-07-20T06:21:07Z

@ariG23498, thank you for your amazing reference repository: ariG23498/keras-nlp-hugging-face-integration 🙏

mattdangerw · 2024-07-22T16:45:50Z

Thanks! Why do we need to add a new public facing argument hf_key_prefix? Ideally we would keep our exposed API surface minimal.

ariG23498 · 2024-07-22T17:30:06Z

@mattdangerw

I have noticed that some model key names have a prefix that breaks model porting.

If one goes to distill-gpt2 they will find the prefix transformer. The simple prefix will break the model porting code.

With the use of hf_key_prefix I introduce a parameter which can be used to set the prefix while model loading.

An example would be

import keras_nlp

model = keras_nlp.models.GPT2CausalLM.from_preset(
    "hf://distilbert/distilgpt2",
    hf_key_prefix="transformer",
)

print(model.generate(["what is"], max_length=15))

mattdangerw · 2024-07-22T18:20:09Z

Is there any other prefix we need to check for besides "transformer"? Can we just write some code that either just checks for the "transformer" prefix or any prefix to all the weights? I'm not super familiar with the safetensors API, but there must be a way to list keys right?

I think that'd be a lot more usable, and keep the API clean. In practice I don't think people will actually understand what went wrong, look up the safetensor content, discover the key, and pass it.

ariG23498 · 2024-07-22T19:27:25Z

@mattdangerw I agree completely with your pointers.

Upon talking to Matt we think it would be easiest to apply a regex to capture the prefix if any. The transformers library captures the prefix and then removes it while loading in their models. Unfortunately we would not have that information.

To be precise this is where we would like to apply the regex code
https://github.com/keras-team/keras-nlp/blob/b6877df38d5ddadcd1f7c9c30498b933b4b6ee30/keras_nlp/src/utils/transformers/safetensor_utils.py#L50

I think that would remove the listing keys and checking altogether.

WDYT?

cosmo3769 · 2024-07-22T19:39:40Z

Upon talking to Matt we think it would be easiest to apply a regex to capture the prefix if any.

Yeah, something like this demo. Using regex here to get the prefix upto layer_index: tested with gpt2 model

@ariG23498 @mattdangerw

mattdangerw · 2024-07-22T23:56:39Z

We should make sure to handle both the sharded and single file safetensor case. I do think we could handle this in SafetensorLoader. Since we either get the full list of keys via file.keys() or safetensor_config["weight_map"].keys() we can use that to resolve the actual key name as needed. Let's try to keep the implementation simple.

keras_nlp/src/utils/transformers/safetensor_utils.py

mattdangerw · 2024-07-23T21:42:03Z

Also, please run the formatting script!

ariG23498 · 2024-07-24T06:02:49Z

keras_nlp/src/utils/transformers/safetensor_utils.py

+    def get_prefix(self, key, all_keys):
+        for k in all_keys:
+            if k.endswith(key) and k != key:
+                prefix = k[: -len(key)]
+                return prefix + key
+        return key


I like the implementation! WDYT @mattdangerw

My 2 cents:

We should use better variable naming.

The name of the function is misleading.

Adding a docstring here, so that we are well aware of the problem and how to solve it.

Come to think of it. I think a better approach to this would be to loop over the keys all at once and have a one-to-one mapping of the hf keys and the keras keys.

Return the map, and then use that map later.

This bypasses the following:

Looping over all the keys multiple times

Using this function multiple times

Come to think of it. I think a better approach to this would be to loop over the keys all at once and have a one-to-one mapping of the hf keys and the keras keys.

Makes sense. It will be efficient and real power of this will show up when there will be large number of keys. Mapping it all at once instead of running the loop always surely reduces the time complexity as well.

Yeah caching the prefix in some form sounds good.

Is the prefix always the same for all weights? If so, we could probably do something like this...

def get_prefix(key, dict_like): if self.prefix is not None: return self.prefix keys = dict_like.keys() if key in keys: self.prefix = "" else: self.prefix = # Some code to figure out the correct prefix. return self.prefix

mattdangerw

Looking good! Mostly minor nits, but the comments on the conversion--don't hardcode the query/key/value size, is important.

keras_nlp/src/utils/transformers/convert_gpt2.py

keras_nlp/src/utils/transformers/safetensor_utils.py

mattdangerw

Thanks! LGTM

cosmo3769 added 2 commits July 20, 2024 11:44

port gpt2

0829d62

updated test

28dd08c

mattdangerw requested review from mattdangerw and ariG23498 July 22, 2024 16:45

cosmo3769 force-pushed the port-gpt2 branch from daa94ad to 28dd08c Compare July 23, 2024 06:39

cosmo3769 added 2 commits July 23, 2024 12:10

removed unused params

4008008

update key prefix

5fad456

mattdangerw reviewed Jul 23, 2024

View reviewed changes

keras_nlp/src/utils/transformers/safetensor_utils.py Outdated Show resolved Hide resolved

update sharded files

e342e4f

ariG23498 suggested changes Jul 24, 2024

View reviewed changes

cosmo3769 added 2 commits July 26, 2024 13:48

update get_prefix logic

91b8b69

added docsting for get_prefix func

c035ba3

mattdangerw reviewed Jul 26, 2024

View reviewed changes

cosmo3769 added 5 commits July 27, 2024 15:10

consistent comment keras style

55af1f6

optimized get_prefixed_key func

1976a23

change intermediate_dim config value

bd096cb

replaced harcoded value

a428fa8

code formatted

a024ccf

cosmo3769 requested review from mattdangerw and ariG23498 July 29, 2024 16:12

mattdangerw approved these changes Jul 29, 2024

View reviewed changes

mattdangerw added the kokoro:force-run Runs Tests on GPU label Jul 29, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 29, 2024

mattdangerw merged commit cb49405 into keras-team:master Jul 29, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port gpt2 transformers checkpoint #1704

Port gpt2 transformers checkpoint #1704

cosmo3769 commented Jul 20, 2024

cosmo3769 commented Jul 20, 2024

mattdangerw commented Jul 22, 2024

ariG23498 commented Jul 22, 2024

mattdangerw commented Jul 22, 2024

ariG23498 commented Jul 22, 2024

cosmo3769 commented Jul 22, 2024 •

edited

Loading

mattdangerw commented Jul 22, 2024

mattdangerw commented Jul 23, 2024

ariG23498 Jul 24, 2024

ariG23498 Jul 24, 2024

cosmo3769 Jul 24, 2024

mattdangerw Jul 24, 2024 •

edited

Loading

mattdangerw left a comment

mattdangerw left a comment

Port gpt2 transformers checkpoint #1704

Port gpt2 transformers checkpoint #1704

Conversation

cosmo3769 commented Jul 20, 2024

cosmo3769 commented Jul 20, 2024

mattdangerw commented Jul 22, 2024

ariG23498 commented Jul 22, 2024

mattdangerw commented Jul 22, 2024

ariG23498 commented Jul 22, 2024

cosmo3769 commented Jul 22, 2024 • edited Loading

mattdangerw commented Jul 22, 2024

mattdangerw commented Jul 23, 2024

ariG23498 Jul 24, 2024

Choose a reason for hiding this comment

ariG23498 Jul 24, 2024

Choose a reason for hiding this comment

cosmo3769 Jul 24, 2024

Choose a reason for hiding this comment

mattdangerw Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

cosmo3769 commented Jul 22, 2024 •

edited

Loading

mattdangerw Jul 24, 2024 •

edited

Loading