Port mistral transformer checkpoint #1768

cosmo3769 · 2024-08-13T00:27:30Z

Ported mistral transformers checkpoint in kerasNLP. Please check. Thank you!

mattdangerw

Looks good overall! One comment.

Could you include a small colab showing generation just to verify this is working? Since we don't have numerics validation yet.

mattdangerw · 2024-08-14T00:08:42Z

keras_nlp/src/utils/transformers/convert_mistral.py

+        "rope_max_wavelength": transformers_config["rope_theta"],
+        "layer_norm_epsilon": transformers_config["rms_norm_eps"],
+        "sliding_window": transformers_config["sliding_window"],
+        "dtype": transformers_config["torch_dtype"],


I don't think we should convert dtype. We don't for other models.

We will create a backbone with the default Keras floating point type, unless the user supplies their own arg. But we don't restore to the saved dtypes policy by default.

Without dtype conversion, I am getting an error: DTypePromotionError: The DTypes <class 'numpy.dtypes.Float16DType'> and <class 'numpy.dtype[bfloat16]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is object.

Interesting. I think this is something we will have to solve during weight conversion, and not by sticking this value in the config. I will take a look.

@mattdangerw do you think this would be a better place to keep the check?
https://github.com/keras-team/keras-nlp/blob/f80fbfd0eaeee7a9e63a4c98a81ff8aba5506f3e/keras_nlp/src/utils/transformers/safetensor_utils.py#L97

We can check if the dtypes match here -- if there is a conversion needed, warn the user that there is a type conversion happening at this stage to port the weights and then continue?

@ariG23498 Makes sense. 💡

I would actually think any type conversion should happen inside of the assign call. I tried removing this dtype line and could not reproduce the error. Is this only on a specific backend?

https://github.com/keras-team/keras/blob/413f859d892394f584fcdd61b41d13e5999242a3/keras/src/backend/common/variables.py#L224

I don't think we need to warn that type conversion is happening. Loading a half precision save at full precision or vice versa is quite common.

So we can just remove this line right? I'll give that a try, and land if things look good.

Yeah, I checked it now by removing this line. It works.

cosmo3769 · 2024-08-16T14:55:33Z

Could you include a small colab showing generation just to verify this is working? Since we don't have numerics validation yet.

Working demo Colab link

mattdangerw · 2024-08-19T23:17:12Z

keras_nlp/src/utils/transformers/convert_mistral_test.py

+class TestTask(TestCase):
+    @pytest.mark.large
+    def test_convert_tiny_preset(self):
+        model = MistralCausalLM.from_preset("hf://mistralai/Mistral-7B-v0.1")


This is too big to run in our automated testing regularly. @ariG23498 can you detail what you did to make hf://ariG23498/tiny-gemma-test?

Here is a detailed code to build a small test model and how to upload that to hub.

@cosmo3769 could you take a look at it?

@ariG23498 Sure.

mattdangerw · 2024-08-19T23:43:30Z

Main thing we need before we merge is a smaller test case. Left a common on the big chain though, still not sure exactly where things are breaking if you remove dtype from the config.

…port-mistral

cosmo3769 · 2024-08-21T16:17:58Z

Added tiny-mistral test.

mattdangerw

Lgtm! Will merge after test runs

…port-mistral

cosmo3769 · 2024-08-21T21:51:02Z

Resolved the merge conflict.

mattdangerw · 2024-08-21T23:48:53Z

Jax failure is from #1783, but this one looks good. Pulling this in!

* ported mistral * update test * fix config * fix typo * switched float32 to float16 * tiny-mistral-test * removed dtype config

cosmo3769 added 5 commits August 13, 2024 04:31

ported mistral

4dd8299

update test

f648002

fix config

151c466

fix typo

8ab13c4

switched float32 to float16

a55d28b

mattdangerw reviewed Aug 14, 2024

View reviewed changes

mattdangerw reviewed Aug 19, 2024

View reviewed changes

cosmo3769 added 2 commits August 21, 2024 02:53

Merge branch 'master' of https://github.com/cosmo3769/keras-nlp into …

e4a144a

…port-mistral

tiny-mistral-test

3d77342

cosmo3769 requested review from mattdangerw and ariG23498 August 21, 2024 16:18

removed dtype config

f82b4fd

mattdangerw approved these changes Aug 21, 2024

View reviewed changes

Merge branch 'master' of https://github.com/cosmo3769/keras-nlp into …

d3819f8

…port-mistral

mattdangerw added the kokoro:force-run Runs Tests on GPU label Aug 21, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Aug 21, 2024

mattdangerw merged commit 081e4c8 into keras-team:master Aug 21, 2024
9 of 10 checks passed

pkgoogle pushed a commit to pkgoogle/keras-hub that referenced this pull request Aug 22, 2024

Port mistral transformer checkpoint (keras-team#1768)

46aa723

* ported mistral * update test * fix config * fix typo * switched float32 to float16 * tiny-mistral-test * removed dtype config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port mistral transformer checkpoint #1768

Port mistral transformer checkpoint #1768

cosmo3769 commented Aug 13, 2024

mattdangerw left a comment

mattdangerw Aug 14, 2024

cosmo3769 Aug 16, 2024

mattdangerw Aug 16, 2024

ariG23498 Aug 18, 2024

cosmo3769 Aug 18, 2024

mattdangerw Aug 19, 2024

mattdangerw Aug 21, 2024

cosmo3769 Aug 21, 2024

cosmo3769 commented Aug 16, 2024

mattdangerw Aug 19, 2024

ariG23498 Aug 20, 2024

cosmo3769 Aug 20, 2024

mattdangerw commented Aug 19, 2024

cosmo3769 commented Aug 21, 2024

mattdangerw left a comment

cosmo3769 commented Aug 21, 2024

mattdangerw commented Aug 21, 2024

Port mistral transformer checkpoint #1768

Port mistral transformer checkpoint #1768

Conversation

cosmo3769 commented Aug 13, 2024

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cosmo3769 commented Aug 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw commented Aug 19, 2024

cosmo3769 commented Aug 21, 2024

mattdangerw left a comment

Choose a reason for hiding this comment

cosmo3769 commented Aug 21, 2024

mattdangerw commented Aug 21, 2024