-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Perplexity Metric #68
Conversation
Please ping us when this is ready for review, thanks! |
Hey, @chenmoneygithub! This PR is ready for review. I have not added test cases yet because if there are any major changes to be made, I'll have to make the same changes in test cases; I was waiting for an initial review. If you think the overall structure looks good, I can add unit tests now. Thanks! |
Since your are overriding Metric class' method instead of proposing new methods, the API interface is stable, so your revision won't cause back and forth edits on test file. Please add unit tests so that we can better evaluate the functionality, thanks! |
Awesome! I'll add unit tests 👍🏼 |
@mattdangerw, @chenmoneygithub, I've added UTs. This PR is now ready for review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Dropped some initial comments.
keras_nlp/metrics/perplexity.py
Outdated
|
||
if self.pad_token_id is not None: | ||
sample_weight = tf.cast( | ||
tf.math.logical_not(tf.equal(y_true, 0)), self._dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be tf.equal(y_true, self.pad_token_id)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. Fixed 👍🏼
keras_nlp/metrics/perplexity.py
Outdated
) | ||
|
||
def update_state(self, y_true, y_pred, sample_weight=None): | ||
# y_true shape: (bsz, seq_len), y_pred shape: (bsz, seq_len, vocab_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bsz is vague, please just use full name batch_size
keras_nlp/metrics/perplexity.py
Outdated
) | ||
|
||
def update_state(self, y_true, y_pred, sample_weight=None): | ||
# y_true shape: (bsz, seq_len), y_pred shape: (bsz, seq_len, vocab_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bsz is vague, please just use full name batch_size
keras_nlp/metrics/perplexity.py
Outdated
) | ||
|
||
# Reshape y_true and y_pred. | ||
y_true = tf.reshape(y_true, [-1]) # (bsz * seq_len,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious - why are you doing these reshaping? I think SparseCategoricalCrossentropy
can handle the original shape?
y_true = tf.constant([[1, 1, 0], [0, 2, 1]])
y_pred = tf.random.uniform(shape=[2, 3, 3])
entropy = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction="sum"
)
entropy(y_true, y_pred)
the above code can run, am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right. Actually, the reason why I do reshaping is here: https://github.com/huggingface/transformers/blob/main/examples/research_projects/codeparrot/scripts/validation_loss.py#L56-L69 and https://github.com/huggingface/transformers/blob/v4.17.0/src/transformers/models/bert/modeling_bert.py#L1363.
Essentially, in PyTorch, when I compute the Cross Entropy loss and set the reduction method to mean
, it computes the mean over the non-masked tokens. So, it sums the values of the non-masked tokens and divides by the number of non-masked tokens.
However, when I tried out some experiments with tf.keras.losses.SparseCategoricalCrossentropy
and set the reduction method to mean
, it sums over the non-masked tokens but divides by number of ALL tokens (basically, the first dimension). So, initially, I'd set the reduction method to mean
, but changed it to sum
, and handled the denominator in subsequent lines.
Now that we have set the reduction to sum
, I can remove the lines where I do reshaping. Thanks for pointing this out!
keras_nlp/metrics/perplexity.py
Outdated
) | ||
|
||
# Reshape y_true and y_pred. | ||
y_true = tf.reshape(y_true, [-1]) # (bsz * seq_len,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious - why are you doing these reshaping? I think SparseCategoricalCrossentropy
can handle the original shape?
y_true = tf.constant([[1, 1, 0], [0, 2, 1]])
y_pred = tf.random.uniform(shape=[2, 3, 3])
entropy = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction="sum"
)
entropy(y_true, y_pred)
the above code can run, am I missing something?
@chenmoneygithub, addressed your comments! Thanks for reviewing :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The functionality looks good to me! Dropped some comments on the coding style.
keras_nlp/metrics/perplexity.py
Outdated
|
||
if self.pad_token_id is not None: | ||
sample_weight = tf.cast( | ||
tf.math.logical_not(tf.equal(y_true, 0)), self._dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be tf.equal(y_true, self.pad_token_id)
?
keras_nlp/metrics/perplexity.py
Outdated
```python | ||
# 1. update_state() and result() | ||
perplexity = keras_nlp.metrics.Perplexity(name="perplexity") | ||
target = tf.experimental.numpy.random.randint(low=0, high=10, size=(2, 5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer using this:
target = tf.random.uniform(shape=[2, 5], maxval=10, dtype=tf.int32)
Since after the numpy stuff graduates from experimental namespace, this code would break.
Additionally, we may want to format this example section like this: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/tokenizers/word_piece_tokenizer.py#L123, which is more clear on the output, and more testable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will change 👍🏼
keras_nlp/metrics/perplexity.py
Outdated
pad_token_id=None, | ||
**kwargs, | ||
): | ||
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use python3 style:
super().__init__(name=name, dtype=dtype, **kwargs)
keras_nlp/metrics/perplexity.py
Outdated
**kwargs, | ||
): | ||
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs) | ||
if dtype is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we move this default to the argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean moving it to kwargs
?
I'm planning to handle it this way: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/tokenizers/word_piece_tokenizer.py#L177-L186.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this comment goes into the awkward position, I mean:
if dtype is None:
self._dtype = tf.float32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default is handled on the base class already. Can we remove this check entirely? And keep the error if not self.dtype.is_floating
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure. I think you are talking about this: https://github.com/keras-team/keras/blob/master/keras/metrics/base_metric.py#L122-L123.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used this condition:
if not tf.as_dtype(self.dtype.is_floating)
since the parent class sets it to a string ("float32"
) if dtype == None
.
keras_nlp/metrics/perplexity.py
Outdated
f"Received: dtype={dtype}" | ||
) | ||
|
||
self.cross_entropy_loss = keras.losses.SparseCategoricalCrossentropy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a private member, since it is not directly passed from constructor argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it would be good to use the the metric version here if possible https://www.tensorflow.org/api_docs/python/tf/keras/metrics/SparseCategoricalCrossentropy, and remove _loss
from your variable names. We are dealing with a metric here, not a loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattdangerw, I gave this a look and I don't think we can use tf.keras.metrics.SparseCategoricalCrossentropy()
, since it doesn't have the reduction
arg; it just takes the mean of all the values. We don't want the mean because we want to handle masked tokens later.
A sample of what I mean (pun on mean unintended xD):
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
metric = tf.keras.metrics.SparseCategoricalCrossentropy(from_logits=True)
target = tf.random.uniform(shape=[2, 5], maxval=10, dtype=tf.int32, seed=42)
logits = tf.random.uniform(shape=(2, 5, 10), seed=42)
print("Print Element-wise Loss: ", loss(target, logits))
print("Metric Value: ", metric(target, logits))
Output:
Print Element-wise Loss: tf.Tensor(
[[2.6863036 2.5335345 2.2001379 2.7056365 1.8480766]
[2.319472 2.1234791 2.0424395 2.079166 2.5738573]], shape=(2, 5), dtype=float32)
Metric Value: tf.Tensor(2.3112102, shape=(), dtype=float32)
keras_nlp/metrics/perplexity.py
Outdated
|
||
self.pad_token_id = pad_token_id | ||
|
||
self.aggregate_cross_entropy_loss = self.add_weight( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be private.
keras_nlp/metrics/perplexity.py
Outdated
initializer="zeros", | ||
dtype=self._dtype, | ||
) | ||
self.number_of_samples = self.add_weight( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, should be private.
keras_nlp/metrics/perplexity_test.py
Outdated
|
||
|
||
class PerplexityTest(tf.test.TestCase): | ||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: why are you using this classmethod setUpClass
instead of setUp(self)
?
Also I would prefer setting the value of x, y in separate test cases - the disadvantage is we are copying code around, but the advantage is that the test case is more clear on its purposes and data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setUp()
runs before every unit test, whereas, setUpClass()
runs only once before all the UTs.
For some reason, even though I've set the random seed, setUp()
gives different outputs for every test. That's why I used setUpClass()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add x and y to every UT.
keras_nlp/metrics/perplexity_test.py
Outdated
perplexity = Perplexity(from_logits=True) | ||
|
||
val1 = perplexity(self.y_true_1, self.y_pred_1, self.sample_wt_1) | ||
self.assertAlmostEqual(val1, 9.682761) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using random y_pred and put a magic number here, I would prefer to arbitrarily set y_pred
to some fixed number, and compute the expected value manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so do you mean that we should do something like this?
y_pred = tf.constant([[...], [...]], dtype=...)
As in, for every UT, we should set y_pred explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise the test case is a bit opaque because we do not know what the input is like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abheesht17! Some style comments from me as well.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from keras_nlp.metrics.perplexity import Perplexity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add an import of metrics from the init file one directory up, otherwise the imports will not work on the exported package.
|
||
class Perplexity(keras.metrics.Metric): | ||
"""Perplexity metric. | ||
This class implements the perplexity metric. In short, this class calculates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a lot of returns here. Blank line after the one liner, blank line after paragraph, blank line before Args: and Examples:
keras_nlp/metrics/perplexity.py
Outdated
def update_state(self, y_true, y_pred, sample_weight=None): | ||
# y_true shape: (batch_size, seq_len) | ||
# y_pred shape: (batch_size, seq_len, vocab_size) | ||
y_true = tf.cast(y_true, self._dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a dtype property on metric, so this can be accessed as self.dtype
(many other instances)
keras_nlp/metrics/perplexity.py
Outdated
**kwargs: Other keyword arguments. | ||
Examples: | ||
```python | ||
# 1. update_state() and result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do the core keras doc style here to show output...
https://github.com/keras-team/keras/blob/master/keras/metrics/metrics.py#L74
We don't have checks for this now, but we hope to add those soon. Output would make these more readable.
Also consider breaking these up with headers outside the code block, e.g.
Call the metric directly:
some code
Set padding id:
some code
keras_nlp/metrics/perplexity.py
Outdated
f"Received: dtype={dtype}" | ||
) | ||
|
||
self.cross_entropy_loss = keras.losses.SparseCategoricalCrossentropy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it would be good to use the the metric version here if possible https://www.tensorflow.org/api_docs/python/tf/keras/metrics/SparseCategoricalCrossentropy, and remove _loss
from your variable names. We are dealing with a metric here, not a loss.
keras_nlp/metrics/perplexity.py
Outdated
**kwargs, | ||
): | ||
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs) | ||
if dtype is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default is handled on the base class already. Can we remove this check entirely? And keep the error if not self.dtype.is_floating
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattdangerw, @chenmoneygithub, addressed all comments. Thank you for the review!
keras_nlp/metrics/perplexity.py
Outdated
**kwargs, | ||
): | ||
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs) | ||
if dtype is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure. I think you are talking about this: https://github.com/keras-team/keras/blob/master/keras/metrics/base_metric.py#L122-L123.
keras_nlp/metrics/perplexity.py
Outdated
f"Received: dtype={dtype}" | ||
) | ||
|
||
self.cross_entropy_loss = keras.losses.SparseCategoricalCrossentropy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattdangerw, I gave this a look and I don't think we can use tf.keras.metrics.SparseCategoricalCrossentropy()
, since it doesn't have the reduction
arg; it just takes the mean of all the values. We don't want the mean because we want to handle masked tokens later.
A sample of what I mean (pun on mean unintended xD):
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
metric = tf.keras.metrics.SparseCategoricalCrossentropy(from_logits=True)
target = tf.random.uniform(shape=[2, 5], maxval=10, dtype=tf.int32, seed=42)
logits = tf.random.uniform(shape=(2, 5, 10), seed=42)
print("Print Element-wise Loss: ", loss(target, logits))
print("Metric Value: ", metric(target, logits))
Output:
Print Element-wise Loss: tf.Tensor(
[[2.6863036 2.5335345 2.2001379 2.7056365 1.8480766]
[2.319472 2.1234791 2.0424395 2.079166 2.5738573]], shape=(2, 5), dtype=float32)
Metric Value: tf.Tensor(2.3112102, shape=(), dtype=float32)
keras_nlp/metrics/perplexity.py
Outdated
**kwargs, | ||
): | ||
super(Perplexity, self).__init__(name=name, dtype=dtype, **kwargs) | ||
if dtype is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used this condition:
if not tf.as_dtype(self.dtype.is_floating)
since the parent class sets it to a string ("float32"
) if dtype == None
.
Not sure why the unit tests are failing here. They run as expected locally.
Any idea, @mattdangerw, @chenmoneygithub? Error given here: https://github.com/keras-team/keras-nlp/runs/5797546465?check_suite_focus=true
Not sure how to fix this. |
keras_nlp/metrics/perplexity.py
Outdated
sample_weight = tf.cast(sample_weight, self.dtype) | ||
|
||
# Calculate the Cross Entropy Loss. | ||
cross_entropy_value = tf.cast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked out the source code of tf.keras.metrics.SparseCategoricalCrossentropy, and it is dong WEIGHTED_MEAN reduction (https://github.com/keras-team/keras/blob/d8fcb9d4d4dad45080ecfdd575483653028f8eda/keras/metrics.py#L583), which should automatically set the divisor as the sum over masks, could you help verify it with your unit test? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will try this out! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenmoneygithub, this particular UT failed:
____________________________________________________ PerplexityTest.test_two_inputs_from_logits ____________________________________________________
self = <keras_nlp.metrics.perplexity_test.PerplexityTest testMethod=test_two_inputs_from_logits>
def test_two_inputs_from_logits(self):
perplexity = Perplexity(from_logits=True, pad_token_id=0)
y_true_1 = tf.constant([[1, 3, 0], [2, 1, 3]])
y_pred_1 = tf.constant(
[
[
[1.034, 4.797, 2.82, 1.154],
[2.258, 1.591, 1.811, 1.852],
[3.216, 1.037, 0.3662, 2.7],
],
[
[1.363, 1.726, 1.898, 2.582],
[1.163, 1.943, 1.761, 1.497],
[2.766, 1.453, 2.61, 2.805],
],
]
)
perplexity_val = perplexity(y_true_1, y_pred_1)
self.assertAlmostEqual(perplexity_val, 2.8788896)
y_true_2 = tf.constant([[2, 0, 0], [1, 2, 3]])
y_pred_2 = tf.constant(
[
[
[2.887, 0.885, 2.973, 2.582],
[0.3838, 2.629, 1.91, 1.802],
[0.2578, 1.081, 1.125, 2.773],
],
[
[1.623, 2.784, 0.2109, 2.66],
[2.395, 2.01, 0.252, 1.828],
[0.4482, 2.629, 0.9697, 0.998],
],
]
)
perplexity_val = perplexity(y_true_2, y_pred_2)
> self.assertEqual(perplexity_val, 3.9998498)
E AssertionError: <tf.Tensor: shape=(), dtype=float32, numpy=3.3319612> != 3.9998498
keras_nlp\metrics\perplexity_test.py:132: AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick analysis on Colab. Apparently, when sample_weight
is provided, the outputs of keras.losses.SparseCategoricalCrossentropy
and keras.metrics.SparseCategoricalCrossentropy
don't match. Have a look at this: https://colab.research.google.com/drive/1Jh44hylKVmmdqR1Z3B_redGzr1ZJdI3i?usp=sharing.
Is it some precision thingy which is messing up the values?
Let me know what the correct course of action is. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty odd, I guess we can stick to Loss function and open an issue for future investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coolio :). Thanks! 👍🏼
Your test failure seems to be caused by some env configuration:
I am not sure how that happens, will forward this to the team. |
Hey, @chenmoneygithub. Any possible solution for this? |
Hey, @chenmoneygithub! I've resolved the error we were facing with unit tests. Let me know if further changes are to be made. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
|
||
perplexity_1.update_state(y_true_1, y_pred_1) | ||
perplexity_1.update_state(y_true_2, y_pred_2) | ||
self.assertAlmostEqual( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid directly checking the private members, if we think it is something necessary to check, then we should consider exposing a public interface for it.
for this case, can we just check the perplexity value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. I just wanted to show that cross_entropy_for_state_1 + cross_entropy_for_state_2 = cross_entropy_for_state_3. But yeah, checking for perplexity will suffice. Changes made! I've kept the check for number of samples though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! Please also apply similar changes on other test cases, basically we never want to explicitly check private fields, because private fields are subject to change without notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenmoneygithub, done! Thanks!
|
||
perplexity_1.update_state(y_true_1, y_pred_1) | ||
perplexity_1.update_state(y_true_2, y_pred_2) | ||
self.assertAlmostEqual( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! Please also apply similar changes on other test cases, basically we never want to explicitly check private fields, because private fields are subject to change without notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Some drive-by comments.
keras_nlp/metrics/perplexity.py
Outdated
|
||
def __init__( | ||
self, | ||
name="perplexity", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name
argument should come last. It's a base class argument.
keras_nlp/metrics/perplexity.py
Outdated
def __init__( | ||
self, | ||
name="perplexity", | ||
dtype=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for dtype (which comes before name).
keras_nlp/metrics/perplexity.py
Outdated
self.pad_token_id = pad_token_id | ||
|
||
self._aggregate_cross_entropy = self.add_weight( | ||
name="aggregate_cross_entropy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Spell "crossentropy" in a single word, for consistency. This applies to the weight name and also to Python variable names.
from tensorflow import keras | ||
|
||
|
||
class Perplexity(keras.metrics.Metric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the metric serializable by adding a get_config()
method.
keras_nlp/metrics/perplexity.py
Outdated
from_logits: bool. If True, `y_pred` (input to `update_state()`) should | ||
be the logits as returned by the model. Otherwise, `y_pred` is a | ||
tensor of probabilities. | ||
pad_token_id: int. Token ID of the padding token. If provided, the mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also prefer "mask_token_id" over "pad_token_id"
keras_nlp/metrics/perplexity.py
Outdated
be the logits as returned by the model. Otherwise, `y_pred` is a | ||
tensor of probabilities. | ||
pad_token_id: int. Token ID of the padding token. If provided, the mask | ||
is computed by this class (all padding tokens are masked while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by -> for ?
keras_nlp/metrics/perplexity.py
Outdated
pad_token_id: int. Token ID of the padding token. If provided, the mask | ||
is computed by this class (all padding tokens are masked while | ||
computing the cross entropy loss). Note that if this field is | ||
provided, the `sample_weight` field in `update_state()` is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is problematic; we should combine the masks, not drop one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think is the best way to combine the masks? Element-wise maximum or element-wise addition (if both are not None)? Or do you have something else in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just multiply the masks together. If padding token is set, that will give a mask of 1s and 0s, which could be multiplied with sample_weight.
Put one way... If a padding token has sample weight 0.5
it should be ignore. If a non padding token has sample weight 0.5 we should still weight it by its sample weight before summing it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Done 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @fchollet, for the review comments! I've addressed all of them, save one. Want some clarification over there.
keras_nlp/metrics/perplexity.py
Outdated
pad_token_id: int. Token ID of the padding token. If provided, the mask | ||
is computed by this class (all padding tokens are masked while | ||
computing the cross entropy loss). Note that if this field is | ||
provided, the `sample_weight` field in `update_state()` is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think is the best way to combine the masks? Element-wise maximum or element-wise addition (if both are not None)? Or do you have something else in mind?
@mattdangerw, I've addressed your comment. Any further changes required? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* Add class for perplexity * Fix doc-string example * Add shape checks * Small typo * Small typo - 2 * Remove shape and rank checks - they don't work * Add UTs * Remove reshape ops * Address review comments - I * Add UT for no masking case * Minor change * Add a check for dtype * Format code * Address review comments - II, III * Fix formatting issue * Fix UTs (Attempt I) * Fix UTs (Attempt II) * Fix UTs (Attempt III) * Remove UTs for private members * Remove checks for private members in UTs * Address review comments - IV * Change sample_weight, mask functionality * Format + Lint
Resolves #63.
Notebook: https://colab.research.google.com/drive/1XV1h5aeiy5IlHoQFjDTJ45hRC8wMSf16?usp=sharing
HF Reference: https://github.com/huggingface/transformers/blob/main/examples/research_projects/codeparrot/scripts/validation_loss.py#L56-L69