Add a perplexity metric #63

mattdangerw · 2022-03-25T19:25:46Z

Splitting this issue out from #38.

We should add a perplexity metrics as keras_nlp.metrics.Perplexity.

The text was updated successfully, but these errors were encountered:

mattdangerw · 2022-03-25T19:28:10Z

huggingface resource: https://huggingface.co/docs/transformers/perplexity

@abheesht17's colab: http://go/colabx-drive/1BH1lTw_qLK6671oWaoU15IrUKSQfd6oE

mattdangerw · 2022-03-25T19:32:12Z

Looking through the colab, I think method 1 would be the correct approach? Over method 2. Perplexity seems to me to be defined on a single input sequence, so averaging over all sequences in the batch (and then all batches), seems reasonable.

I think @chenmoneygithub was going to take a look here too, to tagging for thoughts.

abheesht17 · 2022-03-25T19:45:47Z

Hey, I can't access the notebook link (http://go/colabx-drive/1BH1lTw_qLK6671oWaoU15IrUKSQfd6oE). So, putting the link here again: https://colab.research.google.com/drive/1BH1lTw_qLK6671oWaoU15IrUKSQfd6oE?usp=sharing

chenmoneygithub · 2022-03-25T21:09:07Z

Agree method1 looks correct.

One notes about masking: checking token_id == 0 is not a reliable way to generate mask, because users could customize their mask token id (although I do not know why people do so). The mask should be passed via sample_weights argument.

abheesht17 · 2022-03-25T21:37:58Z

Ah, nice 👍🏼. I'll make the necessary changes and open a PR.

abheesht17 · 2022-03-26T11:02:42Z

Hello, @mattdangerw, @chenmoneygithub! I've made some changes to the class. Please see this notebook: https://colab.research.google.com/drive/1XV1h5aeiy5IlHoQFjDTJ45hRC8wMSf16?usp=sharing.

I've followed this script: https://github.com/huggingface/transformers/blob/main/examples/research_projects/codeparrot/scripts/validation_loss.py#L56-L69.

In the notebook, I've compared our results with HF's results. The perplexity scores returned by both are very close to each other!

I'll open a PR for this now :D

mattdangerw added the type:feature New feature or request label Mar 25, 2022

mattdangerw assigned abheesht17 Mar 25, 2022

abheesht17 mentioned this issue Mar 26, 2022

Add Perplexity Metric #68

Merged

mattdangerw closed this as completed in #68 Apr 13, 2022

mattdangerw pushed a commit that referenced this issue May 21, 2024

Fix ViT build issue (#63)

fe6a375

mattdangerw pushed a commit that referenced this issue May 21, 2024

Fix ViT build issue (#63)

2645213

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a perplexity metric #63

Add a perplexity metric #63

mattdangerw commented Mar 25, 2022

mattdangerw commented Mar 25, 2022

mattdangerw commented Mar 25, 2022

abheesht17 commented Mar 25, 2022

chenmoneygithub commented Mar 25, 2022

abheesht17 commented Mar 25, 2022

abheesht17 commented Mar 26, 2022

Add a perplexity metric #63

Add a perplexity metric #63

Comments

mattdangerw commented Mar 25, 2022

mattdangerw commented Mar 25, 2022

mattdangerw commented Mar 25, 2022

abheesht17 commented Mar 25, 2022

chenmoneygithub commented Mar 25, 2022

abheesht17 commented Mar 25, 2022

abheesht17 commented Mar 26, 2022