-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add helper to generate an eval result model-index
, with proper typing
#382
Conversation
(CI failure is unrelated) |
@@ -0,0 +1,83 @@ | |||
# Code generated by jtd-codegen for Python v0.3.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice !
This could be useful to add in comments how to update it to propose a PR if/when the original schema changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment that led to another general question : shouldn't we publish de JTD in this repo as a static asset to be use with the python jtd-codegen command that generates src/huggingface_hub/repocard_types.py
at some point we can do it but for now i think we can merge like this? Will wait for a few other reviewers' review before merging |
That looks very useful - thanks for adding it! Just one question: how would one go about adding multiple metrics? Would one call the |
No then I would say you would create the Or do you think it's common even to warrant another helper? |
The use-case with several metrics is pretty common I would say (e.g. some flavour of accuracy/precision/recall/f1 for classification type tasks such as sequence/token classification or QA) and also what the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this PR! I'm not entirely certain I understand the use case: I think the addition of the dataclasses ModelIndex
, SingleMetric
, etc, are useful, but I feel like the metadata_eval_result
is too limiting.
Results are a list, metrics are a list, but this only offers a single possibility for both results and metrics. I think it's a bit misleading and doesn't capture the array of possibilities available thanks to the modelcard format. This tool would be useful to build an initial modelcard with a single evaluation/single metric, but would not work to complete that same card with additional metrics or evaluation results, nor would it work for multiple evaluations or multiple metrics out of the box.
I would push for users to build their own SingleResult
objects made up of their own SingleMetric
objects, which would give them much more flexibility and understanding of what is possible with the current implementation.
Yeah I think Happy to rework and/or document if this is not clear enough, or you can take over and push this PR over the line if you'd like Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to have this as a first approach on which we can build in the future. Thanks, @julien-c, and thanks for implementing tests too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all other reviewers are happy with this, I'm happy as well. Thanks for proposing, implementing and testing this!
I think this is helpful (and could replace some code in
transformers
) but let me know, it can probably be improved.The "types" themselves are generated from the JTD schema used in validation in our internal repo (but I didn't keep the to/from json serialization code as we're not using it)
See also internal context: https://huggingface.slack.com/archives/C01BWJU0YKW/p1632159653242300?thread_ts=1632158986.236100&cid=C01BWJU0YKW