Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo metadata load and save #339

Merged
merged 8 commits into from
Sep 15, 2021
Merged

Repo metadata load and save #339

merged 8 commits into from
Sep 15, 2021

Conversation

julien-c
Copy link
Member

@julien-c julien-c commented Sep 9, 2021

Tentatively close #54

Example use case:

Copy link
Contributor

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely useful, thanks a lot for adding this!

@julien-c julien-c requested a review from elishowk September 9, 2021 21:39
@osanseviero
Copy link
Contributor

Thanks so much for this! This is a very needed upgrade! ❤️‍🔥

This will be very useful for all our 3rd party integrations if we ever need to modify all the repos. We should communicate about this PR once merged to all big libraries and orgs (e.g. Stanford NLP)

@julien-c
Copy link
Member Author

julien-c commented Sep 9, 2021

(current status: spamming the CI to try and get all transient errors to go green 😅)

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!

src/huggingface_hub/repocard.py Outdated Show resolved Hide resolved
Co-authored-by: Omar Sanseviero <[email protected]>
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on that @julien-c, that's very useful!

As seen with @elishowk this morning it would be nice to eventually have the same validation here as what is enforced in the backend. transformers could then leverage it. (maybe not for this PR :))

@@ -13,6 +13,7 @@
TF_WEIGHTS_NAME = "model.ckpt"
FLAX_WEIGHTS_NAME = "flax_model.msgpack"
CONFIG_NAME = "config.json"
MODELCARD_NAME = "README.md"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're setting MODELCARD_NAME here but using the repocard term elsewhere - what should it be? I would favor having all instances be REPOCARD_NAME if you're looking for a repo-type-agnostic name

else:
raise ValueError("repo card metadata block should be a dict")
else:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably raise an error too, right? Running metadata_load on any random file will return nothing while I would expect it to fail if it doesn't find any metadata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the expected behavior then I would also put it in a test to ensure that it doesn't switch to raising an error in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline we have a lot of repo cards (model cards, etc) that do not have a yaml block, and i think the expected behavior here is to return None. If need be, we might modify this in a subsequent PR

@julien-c
Copy link
Member Author

@LysandreJik attempted to solve merge conflict

This should be ready to merge, depending on final maintainer acceptance (😘 )

Copy link
Contributor

@elishowk elishowk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LysandreJik
Copy link
Member

Will push an additional test and merge in a bit.

@LysandreJik
Copy link
Member

Thank you for your work @julien-c

@LysandreJik LysandreJik merged commit e665982 into main Sep 15, 2021
@LysandreJik LysandreJik deleted the repo_metadata_load_and_save branch September 15, 2021 14:54
@elishowk
Copy link
Contributor

(current status: spamming the CI to try and get all transient errors to go green sweat_smile)

Note about that issue I witnessed working on PRs : huggingface_hub and transformers CI often fails because of moon-staging 504 timeouts. If needed, I can work on strengthening this staging server's capacities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Have programatic way of adding metadata to a repo
5 participants