Implement compute_output_spec() for tokenizers with vocabulary. #1523

briango28 · 2024-03-25T08:23:23Z

Implements the same compute_output_spec() method for BytePairTokenizer, WordPieceTokenizer, and SentencePieceTokenizer.

briango28 · 2024-03-26T03:32:41Z

Previous version used keras.KerasTensor which apparently did not exist in keras v.2.
Updated to use keras.Input instead.

briango28 · 2024-03-28T05:28:12Z

Ran format.sh.
I was working behind a MITM proxy without a proper linux environment, and had to resort to manual copying which turned out to be rather unwieldy.
Hopefully will pass tests now.

mattdangerw

Thanks! Just a few comments

keras_nlp/tokenizers/byte_pair_tokenizer.py

…arted from new point in master branch)

briango28 · 2024-03-29T02:25:26Z

Applied above discussions. The function now looks like this:

class TokenizerWithVocabulary:
    def compute_output_spec(self, input_spec) -> keras.KerasTensor:
        return keras.KerasTensor(
            input_spec.shape + (self.sequence_length,), dtype=self.compute_dtype
        )

keras_nlp/tokenizers/sentence_piece_tokenizer.py

mattdangerw · 2024-03-29T19:18:03Z

Thank you!

…s-team#1523) * Implement compute_output_spec() for tokenizers with vocabulary. (restarted from new point in master branch) * Remove type annotation from compute_output_spec() in tokenizers

briango28 mentioned this pull request Mar 25, 2024

SentencePieceTokenizer inside a keras.models.Model fails to be reconstructed during keras.saving.load_model() #1522

Closed

mattdangerw self-requested a review March 29, 2024 00:01

mattdangerw reviewed Mar 29, 2024

View reviewed changes

keras_nlp/tokenizers/byte_pair_tokenizer.py Outdated Show resolved Hide resolved

keras_nlp/tokenizers/byte_pair_tokenizer.py Outdated Show resolved Hide resolved

briango28 closed this Mar 29, 2024

Implement compute_output_spec() for tokenizers with vocabulary. (rest…

f7ecd40

…arted from new point in master branch)

briango28 reopened this Mar 29, 2024

mattdangerw reviewed Mar 29, 2024

View reviewed changes

keras_nlp/tokenizers/sentence_piece_tokenizer.py Outdated Show resolved Hide resolved

Remove type annotation from compute_output_spec() in tokenizers

eba4757

mattdangerw added the kokoro:force-run Runs Tests on GPU label Mar 29, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 29, 2024

mattdangerw merged commit 5341426 into keras-team:master Mar 29, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement compute_output_spec() for tokenizers with vocabulary. #1523

Implement compute_output_spec() for tokenizers with vocabulary. #1523

briango28 commented Mar 25, 2024

briango28 commented Mar 26, 2024

briango28 commented Mar 28, 2024

mattdangerw left a comment

briango28 commented Mar 29, 2024

mattdangerw commented Mar 29, 2024

Implement compute_output_spec() for tokenizers with vocabulary. #1523

Implement compute_output_spec() for tokenizers with vocabulary. #1523

Conversation

briango28 commented Mar 25, 2024

briango28 commented Mar 26, 2024

briango28 commented Mar 28, 2024

mattdangerw left a comment

Choose a reason for hiding this comment

briango28 commented Mar 29, 2024

mattdangerw commented Mar 29, 2024