Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helper to generate an eval result model-index, with proper typing #382

Merged
merged 6 commits into from
Oct 11, 2021

Conversation

julien-c
Copy link
Member

@julien-c julien-c commented Oct 1, 2021

I think this is helpful (and could replace some code in transformers) but let me know, it can probably be improved.

The "types" themselves are generated from the JTD schema used in validation in our internal repo (but I didn't keep the to/from json serialization code as we're not using it)

See also internal context: https://huggingface.slack.com/archives/C01BWJU0YKW/p1632159653242300?thread_ts=1632158986.236100&cid=C01BWJU0YKW

@julien-c
Copy link
Member Author

julien-c commented Oct 1, 2021

(CI failure is unrelated)

@@ -0,0 +1,83 @@
# Code generated by jtd-codegen for Python v0.3.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice !
This could be useful to add in comments how to update it to propose a PR if/when the original schema changes ?

Copy link
Contributor

@elishowk elishowk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment that led to another general question : shouldn't we publish de JTD in this repo as a static asset to be use with the python jtd-codegen command that generates src/huggingface_hub/repocard_types.py

@julien-c
Copy link
Member Author

julien-c commented Oct 4, 2021

at some point we can do it but for now i think we can merge like this?

Will wait for a few other reviewers' review before merging

@lvwerra
Copy link
Member

lvwerra commented Oct 4, 2021

That looks very useful - thanks for adding it! Just one question: how would one go about adding multiple metrics? Would one call the metadata_eval_result function for each metric and concatenate the list at 'model-index' in the dict? I have this example or a use-case that even has several tasks.

@julien-c
Copy link
Member Author

julien-c commented Oct 4, 2021

No then I would say you would create the ModelIndex object yourself like what's inside metadata_eval_result (or just create the dict by hand in that case)

Or do you think it's common even to warrant another helper?

@lvwerra
Copy link
Member

lvwerra commented Oct 4, 2021

The use-case with several metrics is pretty common I would say (e.g. some flavour of accuracy/precision/recall/f1 for classification type tasks such as sequence/token classification or QA) and also what the Trainer supports with custom metrics. Having several evaluation tasks is probably rarer in real use-cases and more academic. What do you think?

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this PR! I'm not entirely certain I understand the use case: I think the addition of the dataclasses ModelIndex, SingleMetric, etc, are useful, but I feel like the metadata_eval_result is too limiting.

Results are a list, metrics are a list, but this only offers a single possibility for both results and metrics. I think it's a bit misleading and doesn't capture the array of possibilities available thanks to the modelcard format. This tool would be useful to build an initial modelcard with a single evaluation/single metric, but would not work to complete that same card with additional metrics or evaluation results, nor would it work for multiple evaluations or multiple metrics out of the box.

I would push for users to build their own SingleResult objects made up of their own SingleMetric objects, which would give them much more flexibility and understanding of what is possible with the current implementation.

@julien-c
Copy link
Member Author

julien-c commented Oct 5, 2021

Yeah I think metadata_eval_result is more like a documented example of the simplest use-case of "ModelIndex and friends", that leads to your model having enough data to get a spot in leaderboards on hf.co

Happy to rework and/or document if this is not clear enough, or you can take over and push this PR over the line if you'd like

Thanks!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to have this as a first approach on which we can build in the future. Thanks, @julien-c, and thanks for implementing tests too.

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all other reviewers are happy with this, I'm happy as well. Thanks for proposing, implementing and testing this!

@LysandreJik LysandreJik merged commit 60c11fd into main Oct 11, 2021
@LysandreJik LysandreJik deleted the model-index-types branch October 11, 2021 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants