Add a Causal LM model for Mistral #1429

tirthasheshpatel · 2024-02-08T19:54:55Z

This PR adds a Causal LM for Mistral called MistralCausalLM and a preprocessor for it called MistralCausalLMPreprocessor. Presets are not added yet but can done in a follow-up PR.

Note that I removed sliding window attention cache from the attention layer for Mistral. This is because JAX was complaining about dynamic slicing which is required to make the caching work. More explaination in this commit: 19b0b89

I am in the process of testing if this model matches the outputs of the original model after the weights transfer. Once that's done, I can open the PR up for reviews.

JAX complains about dynamic slicing when compiled with XLA. This is unavoidable since, at runtime, the slice of the current key/value array to use for that iteration is determined by `cache_update_index` which is itself a JAX `TracedArray`. Any workaround would lead to using dynamic shapes at some point. Hence, I had to remove this and instead use vanilla caching for now. For some reason, TensorFlow doesn't complain with XLA. I think this might be because TensorFlow is as stringent about statis shapes as JAX. In any case, adding sliding window attention that is XLA compatible is a story for the future.

keras_nlp/models/mistral/mistral_attention.py

tirthasheshpatel · 2024-02-13T01:59:18Z

I am in the process of testing if this model matches the outputs of the original model after the weights transfer. Once that's done, I can open the PR up for reviews.

@mattdangerw Tested it with the 7B preset. The outputs of both the backbone and the generator match up. This is ready from my side! I can share the preset with you once this is merged.

mattdangerw

Looks great! Just a couple minor comments.

mattdangerw · 2024-02-13T05:59:09Z

keras_nlp/models/mistral/mistral_causal_lm.py

+            **kwargs,
+        )
+
+        # Default compilation


Minor nit, was styling this as it's own heading. https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/gpt2/gpt2_causal_lm.py#L172

mattdangerw · 2024-02-13T05:59:37Z

keras_nlp/models/mistral/mistral_causal_lm.py

+        hidden_states = backbone(inputs)
+        outputs = backbone.token_embedding(hidden_states, reverse=True)
+
+        # Instantiate the Functional API Model constructor.


Delete this comment, and newline above, the header gives enough clues what this is.

mattdangerw · 2024-02-13T06:02:18Z

keras_nlp/models/mistral/mistral_causal_lm_preprocessor.py

+            padding_mask = padding_mask.astype("bool")
+        # Strip any special tokens during detokenization (e.g. the start and
+        # end markers). In the future we could make this configurable.
+        padding_mask = padding_mask & (token_ids != self.tokenizer.end_token_id)


i think we want to also remove the start_token_id (as it is a different token). Just a line like this below with start_token_id instead.

mattdangerw · 2024-02-13T06:04:33Z

keras_nlp/models/mistral/mistral_causal_lm.py

+    prompt. The generation strategy used is controlled by an additional
+    `sampler` argument on `compile()`. You can recompile the model with
+    different `keras_nlp.samplers` objects to control the generation. By
+    default, `"top_k"` sampling will be used.


Is this a good default? For these newer larger models, we might just want to default to greedy if performance is good.

Maybe quick check, does it tend to get stuck in loops with greedy sampling?

This was the output with "greedy" sampler:

>>> output = generator.generate("What is Keras?", max_length=100) 2024-02-13 06:42:36.336579: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 20952865944 exceeds 10% of free system memory. >>> print(output) What is Keras? Keras is a high-level neural network API, written in Python and capable of running on top of TensorFlow, CNTK or Theano. It was designed with a focus on usability, modularity and extensibility. Keras is a high-level neural network API, written in Python and capable of running on top of TensorFlow, CNTK or Theano. It was designed with a focus on usability, mod

Noticed the same output with HF. I guess, for most prompts, the model would get stuck in a loop eventually.

HF Output:

>>> print(tokenizer.batch_decode(generated_ids)[0]) <s> What is Keras? Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. Keras is meant for quick prototyping and easy and fast training. It should not be used in production. Keras is a high-level API, which means that it is designed to be used by developers who are not experts in machine learning. It is designed to be easy to use, and to make it easy to experiment with different ideas. Keras is a high-level API, which means that it is designed to be used by developers who are not experts in machine learning. It is designed to be easy to use, and to make it easy to experiment with different ideas. Keras is a high-

Thanks for checking! Let's stick with top-k then.

tirthasheshpatel · 2024-02-13T17:24:10Z

@mattdangerw I forgot to put the tokenizer and the LM preprocessor in the public API, will address that along with your comments

tirthasheshpatel added 4 commits February 7, 2024 18:13

Add Mistral Causal LM Preprocessor

7e7717c

Add the Causal LM for Mistral

2e2e2e5

Enable JIT compile in the Mistral LM model

cb74bc7

tirthasheshpatel requested a review from mattdangerw February 8, 2024 19:55

tirthasheshpatel self-assigned this Feb 8, 2024

tirthasheshpatel added the type:feature New feature or request label Feb 8, 2024

tirthasheshpatel commented Feb 8, 2024

View reviewed changes

keras_nlp/models/mistral/mistral_attention.py Show resolved Hide resolved

Fix Mistral transformer decoder

711ec50

tirthasheshpatel marked this pull request as ready for review February 13, 2024 01:57

tirthasheshpatel added 3 commits February 13, 2024 02:21

Merge branch 'master' of github.com:keras-team/keras-nlp into mistral-lm

ff3a33f

Port the causal LM to the new infra

21180e5

Fix a minor bug in sliding window attention caching

4910c98

mattdangerw reviewed Feb 13, 2024

View reviewed changes

tirthasheshpatel added 3 commits February 13, 2024 18:05

Fix a small bug in mistral transformer decoder

f2ad162

Remove the RoPE shenanigan in mistral attention layer

de218aa

Address review comments and add mistral to the public API

effc40e

mattdangerw merged commit 1951b5c into keras-team:master Feb 13, 2024
10 checks passed

tirthasheshpatel deleted the mistral-lm branch February 14, 2024 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Causal LM model for Mistral #1429

Add a Causal LM model for Mistral #1429

tirthasheshpatel commented Feb 8, 2024

tirthasheshpatel commented Feb 13, 2024

mattdangerw left a comment

mattdangerw Feb 13, 2024

tirthasheshpatel Feb 13, 2024

mattdangerw Feb 13, 2024

tirthasheshpatel Feb 13, 2024

mattdangerw Feb 13, 2024

tirthasheshpatel Feb 13, 2024

mattdangerw Feb 13, 2024

tirthasheshpatel Feb 13, 2024 •

edited

Loading

mattdangerw Feb 13, 2024

tirthasheshpatel commented Feb 13, 2024

Add a Causal LM model for Mistral #1429

Add a Causal LM model for Mistral #1429

Conversation

tirthasheshpatel commented Feb 8, 2024

tirthasheshpatel commented Feb 13, 2024

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tirthasheshpatel Feb 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tirthasheshpatel commented Feb 13, 2024

tirthasheshpatel Feb 13, 2024 •

edited

Loading