Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a perplexity metric #63

Closed
mattdangerw opened this issue Mar 25, 2022 · 6 comments · Fixed by #68
Closed

Add a perplexity metric #63

mattdangerw opened this issue Mar 25, 2022 · 6 comments · Fixed by #68
Assignees
Labels
type:feature New feature or request

Comments

@mattdangerw
Copy link
Member

Splitting this issue out from #38.

We should add a perplexity metrics as keras_nlp.metrics.Perplexity.

@mattdangerw
Copy link
Member Author

@mattdangerw mattdangerw added the type:feature New feature or request label Mar 25, 2022
@mattdangerw
Copy link
Member Author

Looking through the colab, I think method 1 would be the correct approach? Over method 2. Perplexity seems to me to be defined on a single input sequence, so averaging over all sequences in the batch (and then all batches), seems reasonable.

I think @chenmoneygithub was going to take a look here too, to tagging for thoughts.

@abheesht17
Copy link
Collaborator

@chenmoneygithub
Copy link
Contributor

Agree method1 looks correct.

One notes about masking: checking token_id == 0 is not a reliable way to generate mask, because users could customize their mask token id (although I do not know why people do so). The mask should be passed via sample_weights argument.

@abheesht17
Copy link
Collaborator

Ah, nice 👍🏼. I'll make the necessary changes and open a PR.

@abheesht17
Copy link
Collaborator

Hello, @mattdangerw, @chenmoneygithub! I've made some changes to the class. Please see this notebook: https://colab.research.google.com/drive/1XV1h5aeiy5IlHoQFjDTJ45hRC8wMSf16?usp=sharing.

I've followed this script: https://github.com/huggingface/transformers/blob/main/examples/research_projects/codeparrot/scripts/validation_loss.py#L56-L69.

In the notebook, I've compared our results with HF's results. The perplexity scores returned by both are very close to each other!

I'll open a PR for this now :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants