Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Perplexity Metric #68

Merged
merged 25 commits into from
Apr 13, 2022
Merged

Add Perplexity Metric #68

merged 25 commits into from
Apr 13, 2022

Conversation

@chenmoneygithub
Copy link
Contributor

Please ping us when this is ready for review, thanks!

@abheesht17
Copy link
Collaborator Author

Please ping us when this is ready for review, thanks!

Hey, @chenmoneygithub! This PR is ready for review. I have not added test cases yet because if there are any major changes to be made, I'll have to make the same changes in test cases; I was waiting for an initial review. If you think the overall structure looks good, I can add unit tests now. Thanks!

@chenmoneygithub
Copy link
Contributor

Please ping us when this is ready for review, thanks!

Hey, @chenmoneygithub! This PR is ready for review. I have not added test cases yet because if there are any major changes to be made, I'll have to make the same changes in test cases; I was waiting for an initial review. If you think the overall structure looks good, I can add unit tests now. Thanks!

Since your are overriding Metric class' method instead of proposing new methods, the API interface is stable, so your revision won't cause back and forth edits on test file. Please add unit tests so that we can better evaluate the functionality, thanks!

@abheesht17
Copy link
Collaborator Author

Please ping us when this is ready for review, thanks!

Hey, @chenmoneygithub! This PR is ready for review. I have not added test cases yet because if there are any major changes to be made, I'll have to make the same changes in test cases; I was waiting for an initial review. If you think the overall structure looks good, I can add unit tests now. Thanks!

Since your are overriding Metric class' method instead of proposing new methods, the API interface is stable, so your revision won't cause back and forth edits on test file. Please add unit tests so that we can better evaluate the functionality, thanks!

Awesome! I'll add unit tests 👍🏼

@abheesht17
Copy link
Collaborator Author

Please ping us when this is ready for review, thanks!

@mattdangerw, @chenmoneygithub, I've added UTs. This PR is now ready for review :)

Copy link
Contributor

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Dropped some initial comments.


if self.pad_token_id is not None:
sample_weight = tf.cast(
tf.math.logical_not(tf.equal(y_true, 0)), self._dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be tf.equal(y_true, self.pad_token_id)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Fixed 👍🏼

)

def update_state(self, y_true, y_pred, sample_weight=None):
# y_true shape: (bsz, seq_len), y_pred shape: (bsz, seq_len, vocab_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bsz is vague, please just use full name batch_size

)

def update_state(self, y_true, y_pred, sample_weight=None):
# y_true shape: (bsz, seq_len), y_pred shape: (bsz, seq_len, vocab_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bsz is vague, please just use full name batch_size

)

# Reshape y_true and y_pred.
y_true = tf.reshape(y_true, [-1]) # (bsz * seq_len,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious - why are you doing these reshaping? I think SparseCategoricalCrossentropy can handle the original shape?

y_true = tf.constant([[1, 1, 0], [0, 2, 1]])
y_pred = tf.random.uniform(shape=[2, 3, 3])

entropy = tf.keras.losses.SparseCategoricalCrossentropy(
            from_logits=True, reduction="sum"
        )
entropy(y_true, y_pred)

the above code can run, am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. Actually, the reason why I do reshaping is here: https://github.com/huggingface/transformers/blob/main/examples/research_projects/codeparrot/scripts/validation_loss.py#L56-L69 and https://github.com/huggingface/transformers/blob/v4.17.0/src/transformers/models/bert/modeling_bert.py#L1363.

Essentially, in PyTorch, when I compute the Cross Entropy loss and set the reduction method to mean, it computes the mean over the non-masked tokens. So, it sums the values of the non-masked tokens and divides by the number of non-masked tokens.

However, when I tried out some experiments with tf.keras.losses.SparseCategoricalCrossentropy and set the reduction method to mean, it sums over the non-masked tokens but divides by number of ALL tokens (basically, the first dimension). So, initially, I'd set the reduction method to mean, but changed it to sum, and handled the denominator in subsequent lines.

Now that we have set the reduction to sum, I can remove the lines where I do reshaping. Thanks for pointing this out!

)

# Reshape y_true and y_pred.
y_true = tf.reshape(y_true, [-1]) # (bsz * seq_len,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious - why are you doing these reshaping? I think SparseCategoricalCrossentropy can handle the original shape?

y_true = tf.constant([[1, 1, 0], [0, 2, 1]])
y_pred = tf.random.uniform(shape=[2, 3, 3])

entropy = tf.keras.losses.SparseCategoricalCrossentropy(
            from_logits=True, reduction="sum"
        )
entropy(y_true, y_pred)

the above code can run, am I missing something?

@abheesht17
Copy link
Collaborator Author

@chenmoneygithub, addressed your comments! Thanks for reviewing :)

Copy link
Contributor

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The functionality looks good to me! Dropped some comments on the coding style.


if self.pad_token_id is not None:
sample_weight = tf.cast(
tf.math.logical_not(tf.equal(y_true, 0)), self._dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be tf.equal(y_true, self.pad_token_id)?

keras_nlp/metrics/perplexity.py Show resolved Hide resolved
```python
# 1. update_state() and result()
perplexity = keras_nlp.metrics.Perplexity(name="perplexity")
target = tf.experimental.numpy.random.randint(low=0, high=10, size=(2, 5))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer using this:

target = tf.random.uniform(shape=[2, 5],  maxval=10, dtype=tf.int32)

Since after the numpy stuff graduates from experimental namespace, this code would break.

Additionally, we may want to format this example section like this: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/tokenizers/word_piece_tokenizer.py#L123, which is more clear on the output, and more testable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change 👍🏼

pad_token_id=None,
**kwargs,
):
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use python3 style:

super().__init__(name=name, dtype=dtype, **kwargs)

**kwargs,
):
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs)
if dtype is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move this default to the argument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this comment goes into the awkward position, I mean:

if dtype is None:
  self._dtype = tf.float32

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default is handled on the base class already. Can we remove this check entirely? And keep the error if not self.dtype.is_floating?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@abheesht17 abheesht17 Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used this condition:

 if not tf.as_dtype(self.dtype.is_floating)

since the parent class sets it to a string ("float32") if dtype == None.

f"Received: dtype={dtype}"
)

self.cross_entropy_loss = keras.losses.SparseCategoricalCrossentropy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a private member, since it is not directly passed from constructor argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think it would be good to use the the metric version here if possible https://www.tensorflow.org/api_docs/python/tf/keras/metrics/SparseCategoricalCrossentropy, and remove _loss from your variable names. We are dealing with a metric here, not a loss.

Copy link
Collaborator Author

@abheesht17 abheesht17 Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw, I gave this a look and I don't think we can use tf.keras.metrics.SparseCategoricalCrossentropy(), since it doesn't have the reduction arg; it just takes the mean of all the values. We don't want the mean because we want to handle masked tokens later.

A sample of what I mean (pun on mean unintended xD):

loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
metric = tf.keras.metrics.SparseCategoricalCrossentropy(from_logits=True)


target = tf.random.uniform(shape=[2, 5],  maxval=10, dtype=tf.int32, seed=42)
logits = tf.random.uniform(shape=(2, 5, 10), seed=42)


print("Print Element-wise Loss: ", loss(target, logits))
print("Metric Value: ", metric(target, logits))

Output:

Print Element-wise Loss:  tf.Tensor(
[[2.6863036 2.5335345 2.2001379 2.7056365 1.8480766]
 [2.319472  2.1234791 2.0424395 2.079166  2.5738573]], shape=(2, 5), dtype=float32)
Metric Value:  tf.Tensor(2.3112102, shape=(), dtype=float32)


self.pad_token_id = pad_token_id

self.aggregate_cross_entropy_loss = self.add_weight(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be private.

initializer="zeros",
dtype=self._dtype,
)
self.number_of_samples = self.add_weight(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should be private.



class PerplexityTest(tf.test.TestCase):
@classmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: why are you using this classmethod setUpClass instead of setUp(self)?

Also I would prefer setting the value of x, y in separate test cases - the disadvantage is we are copying code around, but the advantage is that the test case is more clear on its purposes and data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setUp() runs before every unit test, whereas, setUpClass() runs only once before all the UTs.

For some reason, even though I've set the random seed, setUp() gives different outputs for every test. That's why I used setUpClass().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add x and y to every UT.

perplexity = Perplexity(from_logits=True)

val1 = perplexity(self.y_true_1, self.y_pred_1, self.sample_wt_1)
self.assertAlmostEqual(val1, 9.682761)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using random y_pred and put a magic number here, I would prefer to arbitrarily set y_pred to some fixed number, and compute the expected value manually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so do you mean that we should do something like this?

y_pred = tf.constant([[...], [...]], dtype=...)

As in, for every UT, we should set y_pred explicitly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise the test case is a bit opaque because we do not know what the input is like.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abheesht17! Some style comments from me as well.

# See the License for the specific language governing permissions and
# limitations under the License.

from keras_nlp.metrics.perplexity import Perplexity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add an import of metrics from the init file one directory up, otherwise the imports will not work on the exported package.


class Perplexity(keras.metrics.Metric):
"""Perplexity metric.
This class implements the perplexity metric. In short, this class calculates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a lot of returns here. Blank line after the one liner, blank line after paragraph, blank line before Args: and Examples:

def update_state(self, y_true, y_pred, sample_weight=None):
# y_true shape: (batch_size, seq_len)
# y_pred shape: (batch_size, seq_len, vocab_size)
y_true = tf.cast(y_true, self._dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a dtype property on metric, so this can be accessed as self.dtype

(many other instances)

**kwargs: Other keyword arguments.
Examples:
```python
# 1. update_state() and result()
Copy link
Member

@mattdangerw mattdangerw Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do the core keras doc style here to show output...
https://github.com/keras-team/keras/blob/master/keras/metrics/metrics.py#L74

We don't have checks for this now, but we hope to add those soon. Output would make these more readable.

Also consider breaking these up with headers outside the code block, e.g.

Call the metric directly:
some code

Set padding id:
some code

f"Received: dtype={dtype}"
)

self.cross_entropy_loss = keras.losses.SparseCategoricalCrossentropy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think it would be good to use the the metric version here if possible https://www.tensorflow.org/api_docs/python/tf/keras/metrics/SparseCategoricalCrossentropy, and remove _loss from your variable names. We are dealing with a metric here, not a loss.

**kwargs,
):
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs)
if dtype is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default is handled on the base class already. Can we remove this check entirely? And keep the error if not self.dtype.is_floating?

Copy link
Collaborator Author

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw, @chenmoneygithub, addressed all comments. Thank you for the review!

**kwargs,
):
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs)
if dtype is None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f"Received: dtype={dtype}"
)

self.cross_entropy_loss = keras.losses.SparseCategoricalCrossentropy(
Copy link
Collaborator Author

@abheesht17 abheesht17 Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw, I gave this a look and I don't think we can use tf.keras.metrics.SparseCategoricalCrossentropy(), since it doesn't have the reduction arg; it just takes the mean of all the values. We don't want the mean because we want to handle masked tokens later.

A sample of what I mean (pun on mean unintended xD):

loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
metric = tf.keras.metrics.SparseCategoricalCrossentropy(from_logits=True)


target = tf.random.uniform(shape=[2, 5],  maxval=10, dtype=tf.int32, seed=42)
logits = tf.random.uniform(shape=(2, 5, 10), seed=42)


print("Print Element-wise Loss: ", loss(target, logits))
print("Metric Value: ", metric(target, logits))

Output:

Print Element-wise Loss:  tf.Tensor(
[[2.6863036 2.5335345 2.2001379 2.7056365 1.8480766]
 [2.319472  2.1234791 2.0424395 2.079166  2.5738573]], shape=(2, 5), dtype=float32)
Metric Value:  tf.Tensor(2.3112102, shape=(), dtype=float32)

**kwargs,
):
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs)
if dtype is None:
Copy link
Collaborator Author

@abheesht17 abheesht17 Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used this condition:

 if not tf.as_dtype(self.dtype.is_floating)

since the parent class sets it to a string ("float32") if dtype == None.

@abheesht17
Copy link
Collaborator Author

abheesht17 commented Apr 2, 2022

Not sure why the unit tests are failing here. They run as expected locally.

keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_from_logits_with_pad_token_id PASSED                                               [ 52%]
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_from_logits_with_sample_weight PASSED                                              [ 53%] 
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_from_logits_without_masking PASSED                                                 [ 55%] 
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_from_probs_with_pad_token PASSED                                                   [ 56%]
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_from_probs_with_sample_weight PASSED                                               [ 57%] 
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_merge_state PASSED                                                                 [ 59%]
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_reset_state PASSED                                                                 [ 60%] 
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_two_inputs_from_logits PASSED                                                      [ 63%]
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_update_state PASSED                                                                [ 65%]
keras_nlp/metrics/perplexity_test.py::PerplexityTest::test_vars_after_initializing_class PASSED                                               [ 66%] 
========================================================== 61 passed, 8 skipped in 27.33s ========================================================== 

Any idea, @mattdangerw, @chenmoneygithub?

Error given here: https://github.com/keras-team/keras-nlp/runs/5797546465?check_suite_focus=true

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tensorflow/python/ops/numpy_ops/np_array_ops.py:734: in around
    return a.astype(dtype)

Not sure how to fix this.

sample_weight = tf.cast(sample_weight, self.dtype)

# Calculate the Cross Entropy Loss.
cross_entropy_value = tf.cast(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out the source code of tf.keras.metrics.SparseCategoricalCrossentropy, and it is dong WEIGHTED_MEAN reduction (https://github.com/keras-team/keras/blob/d8fcb9d4d4dad45080ecfdd575483653028f8eda/keras/metrics.py#L583), which should automatically set the divisor as the sum over masks, could you help verify it with your unit test? Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will try this out! Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenmoneygithub, this particular UT failed:

____________________________________________________ PerplexityTest.test_two_inputs_from_logits ____________________________________________________ 

self = <keras_nlp.metrics.perplexity_test.PerplexityTest testMethod=test_two_inputs_from_logits>

    def test_two_inputs_from_logits(self):
        perplexity = Perplexity(from_logits=True, pad_token_id=0)

        y_true_1 = tf.constant([[1, 3, 0], [2, 1, 3]])
        y_pred_1 = tf.constant(
            [
                [
                    [1.034, 4.797, 2.82, 1.154],
                    [2.258, 1.591, 1.811, 1.852],
                    [3.216, 1.037, 0.3662, 2.7],
                ],
                [
                    [1.363, 1.726, 1.898, 2.582],
                    [1.163, 1.943, 1.761, 1.497],
                    [2.766, 1.453, 2.61, 2.805],
                ],
            ]
        )

        perplexity_val = perplexity(y_true_1, y_pred_1)
        self.assertAlmostEqual(perplexity_val, 2.8788896)

        y_true_2 = tf.constant([[2, 0, 0], [1, 2, 3]])
        y_pred_2 = tf.constant(
            [
                [
                    [2.887, 0.885, 2.973, 2.582],
                    [0.3838, 2.629, 1.91, 1.802],
                    [0.2578, 1.081, 1.125, 2.773],
                ],
                [
                    [1.623, 2.784, 0.2109, 2.66],
                    [2.395, 2.01, 0.252, 1.828],
                    [0.4482, 2.629, 0.9697, 0.998],
                ],
            ]
        )
        perplexity_val = perplexity(y_true_2, y_pred_2)
>       self.assertEqual(perplexity_val, 3.9998498)
E       AssertionError: <tf.Tensor: shape=(), dtype=float32, numpy=3.3319612> != 3.9998498

keras_nlp\metrics\perplexity_test.py:132: AssertionError

Copy link
Collaborator Author

@abheesht17 abheesht17 Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick analysis on Colab. Apparently, when sample_weight is provided, the outputs of keras.losses.SparseCategoricalCrossentropy and keras.metrics.SparseCategoricalCrossentropy don't match. Have a look at this: https://colab.research.google.com/drive/1Jh44hylKVmmdqR1Z3B_redGzr1ZJdI3i?usp=sharing.
Is it some precision thingy which is messing up the values?

Let me know what the correct course of action is. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty odd, I guess we can stick to Loss function and open an issue for future investigation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coolio :). Thanks! 👍🏼

@chenmoneygithub
Copy link
Contributor

Your test failure seems to be caused by some env configuration:

E       AttributeError: 
E               'EagerTensor' object has no attribute 'astype'.
E               If you are looking for numpy-related methods, please run the following:
E               from tensorflow.python.ops.numpy_ops import np_config
E               np_config.enable_numpy_behavior()

I am not sure how that happens, will forward this to the team.

@abheesht17
Copy link
Collaborator Author

Your test failure seems to be caused by some env configuration:

E       AttributeError: 
E               'EagerTensor' object has no attribute 'astype'.
E               If you are looking for numpy-related methods, please run the following:
E               from tensorflow.python.ops.numpy_ops import np_config
E               np_config.enable_numpy_behavior()

I am not sure how that happens, will forward this to the team.

Hey, @chenmoneygithub. Any possible solution for this?

@abheesht17
Copy link
Collaborator Author

Hey, @chenmoneygithub! I've resolved the error we were facing with unit tests. Let me know if further changes are to be made. Thank you!

Copy link
Contributor

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!


perplexity_1.update_state(y_true_1, y_pred_1)
perplexity_1.update_state(y_true_2, y_pred_2)
self.assertAlmostEqual(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid directly checking the private members, if we think it is something necessary to check, then we should consider exposing a public interface for it.

for this case, can we just check the perplexity value?

Copy link
Collaborator Author

@abheesht17 abheesht17 Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. I just wanted to show that cross_entropy_for_state_1 + cross_entropy_for_state_2 = cross_entropy_for_state_3. But yeah, checking for perplexity will suffice. Changes made! I've kept the check for number of samples though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! Please also apply similar changes on other test cases, basically we never want to explicitly check private fields, because private fields are subject to change without notice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenmoneygithub, done! Thanks!


perplexity_1.update_state(y_true_1, y_pred_1)
perplexity_1.update_state(y_true_2, y_pred_2)
self.assertAlmostEqual(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! Please also apply similar changes on other test cases, basically we never want to explicitly check private fields, because private fields are subject to change without notice.

Copy link
Contributor

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Some drive-by comments.


def __init__(
self,
name="perplexity",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name argument should come last. It's a base class argument.

def __init__(
self,
name="perplexity",
dtype=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for dtype (which comes before name).

self.pad_token_id = pad_token_id

self._aggregate_cross_entropy = self.add_weight(
name="aggregate_cross_entropy",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Spell "crossentropy" in a single word, for consistency. This applies to the weight name and also to Python variable names.

from tensorflow import keras


class Perplexity(keras.metrics.Metric):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the metric serializable by adding a get_config() method.

from_logits: bool. If True, `y_pred` (input to `update_state()`) should
be the logits as returned by the model. Otherwise, `y_pred` is a
tensor of probabilities.
pad_token_id: int. Token ID of the padding token. If provided, the mask
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also prefer "mask_token_id" over "pad_token_id"

be the logits as returned by the model. Otherwise, `y_pred` is a
tensor of probabilities.
pad_token_id: int. Token ID of the padding token. If provided, the mask
is computed by this class (all padding tokens are masked while
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by -> for ?

pad_token_id: int. Token ID of the padding token. If provided, the mask
is computed by this class (all padding tokens are masked while
computing the cross entropy loss). Note that if this field is
provided, the `sample_weight` field in `update_state()` is ignored.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is problematic; we should combine the masks, not drop one of them.

Copy link
Collaborator Author

@abheesht17 abheesht17 Apr 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think is the best way to combine the masks? Element-wise maximum or element-wise addition (if both are not None)? Or do you have something else in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just multiply the masks together. If padding token is set, that will give a mask of 1s and 0s, which could be multiplied with sample_weight.

Put one way... If a padding token has sample weight 0.5 it should be ignore. If a non padding token has sample weight 0.5 we should still weight it by its sample weight before summing it in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Done 👍🏼

Copy link
Collaborator Author

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @fchollet, for the review comments! I've addressed all of them, save one. Want some clarification over there.

pad_token_id: int. Token ID of the padding token. If provided, the mask
is computed by this class (all padding tokens are masked while
computing the cross entropy loss). Note that if this field is
provided, the `sample_weight` field in `update_state()` is ignored.
Copy link
Collaborator Author

@abheesht17 abheesht17 Apr 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think is the best way to combine the masks? Element-wise maximum or element-wise addition (if both are not None)? Or do you have something else in mind?

@abheesht17
Copy link
Collaborator Author

@mattdangerw, I've addressed your comment. Any further changes required?

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mattdangerw mattdangerw merged commit feebbd0 into keras-team:master Apr 13, 2022
adhadse pushed a commit to adhadse/keras-nlp that referenced this pull request Sep 17, 2022
* Add class for perplexity

* Fix doc-string example

* Add shape checks

* Small typo

* Small typo - 2

* Remove shape and rank checks - they don't work

* Add UTs

* Remove reshape ops

* Address review comments - I

* Add UT for no masking case

* Minor change

* Add a check for dtype

* Format code

* Address review comments - II, III

* Fix formatting issue

* Fix UTs (Attempt I)

* Fix UTs (Attempt II)

* Fix UTs (Attempt III)

* Remove UTs for private members

* Remove checks for private members in UTs

* Address review comments - IV

* Change sample_weight, mask functionality

* Format + Lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a perplexity metric
4 participants