Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Mixins/Integration guide #1362

Merged
merged 18 commits into from
Mar 2, 2023
Merged

Write Mixins/Integration guide #1362

merged 18 commits into from
Mar 2, 2023

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Feb 24, 2023

Resolve #1333 (from @NielsRogge 's suggestion in private slack).

This PR adds a new guide page "Integrate any ML framework with the Hub". Contains:

  1. what is an integration?
  2. what is the ModelHubMixin. How to use it?
  3. concrete example with PyTorchHubMixin: how to use it and how has it been implemented

(+ edited some docstrings)

Writing this guide made me realize we can even improve further the ModelHubMixin class, especially for cases where we want to delete files from the Hub (when pushing multiple times). This is done separately in the Keras integration to replace training logs. I think we update change this once #1352 is implemented.

(Note: failing tests are unrelated to this PR)

@Wauplin Wauplin added the documentation Improvements or additions to documentation label Feb 24, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 24, 2023

The documentation is not available anymore as the PR was closed or merged.

@Wauplin Wauplin marked this pull request as ready for review February 27, 2023 11:27
Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! I'm not convinced of making this the officially recommended way of integrations though

docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
In most cases, a library already implements its model using a Python class. The class contains the properties of the model
and methods to load, run, train, evaluate,... it. It would be nice to extend the class with the 2 new methods!

The recommended approach to do so is to use inheritance, and more precisely mixins. A [Mixin](https://stackoverflow.com/a/547714)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make this the official recommended approach? This is more complex/scary than the usually push to hub/download_from_hub with the HTTP endpoints and requires having classes in their libraries

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, I love the link to the guide from the docstring!

docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I loved the guide, just left some nits

docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
@Wauplin
Copy link
Contributor Author

Wauplin commented Feb 28, 2023

Thank you very much @osanseviero @merveenoyan @stevhliu for the detailed review, it's really appreciated.

Apart from all the small suggestions that I will follow (I feel that half of the guide is rewritten, and I'm glad it is the case! 😄), the main question is "do we want to brand that as our recommended way to integration a library"? I think what I'll do is to re-arrange a bit the guide:

  • do not mention we have a recommended way
  • mention the possibility to implement simple methods (+having an example for that)
  • mention the advantages/drawbacks of both approached. So far I think the Mixin method is the more robust is the sense that repo creation/PR creation/revisions/... are not handled by the external library (i.e. less maintenance on their side) but at a cost of a more rigid framework
    WDYT?

@stevhliu
Copy link
Member

I think that's a great idea! Having a table concisely describing the pros/cons of both approaches would also be super nice to have

@osanseviero
Copy link
Contributor

Good idea! 🔥

@Wauplin Wauplin marked this pull request as draft March 1, 2023 14:39
Wauplin and others added 4 commits March 1, 2023 15:43
Co-authored-by: Omar Sanseviero <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Merve Noyan <[email protected]>
Co-authored-by: Merve Noyan <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Omar Sanseviero <[email protected]>
@codecov
Copy link

codecov bot commented Mar 1, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01 ⚠️

Comparison is base (b93803f) 84.53% compared to head (d673b36) 84.53%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1362      +/-   ##
==========================================
- Coverage   84.53%   84.53%   -0.01%     
==========================================
  Files          48       48              
  Lines        4812     4810       -2     
==========================================
- Hits         4068     4066       -2     
  Misses        744      744              
Impacted Files Coverage Δ
src/huggingface_hub/keras_mixin.py 94.16% <ø> (ø)
src/huggingface_hub/hub_mixin.py 94.04% <100.00%> (-0.14%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Wauplin Wauplin marked this pull request as ready for review March 1, 2023 16:45
@Wauplin
Copy link
Contributor Author

Wauplin commented Mar 1, 2023

@osanseviero @stevhliu @merveenoyan Following my last comment, I made some changes to the guide:

  • integrated all of your feedback (:heart:)
  • added a section "A first approach: helpers"
  • kept/renamed the mixin section "A more complex approach: class inheritance". I've updated the first paragraph to make the transition with the section above.
  • added a "Quick comparison" section at the bottom to quickly sum up the 2 approaches. I was not very inspired about what to add to this table so I kept it quite minimal. It's not meant to be understood by someone that did not read the full guide.

2023-03-01_17-49

Let me know if you have new feedback :)

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job and the table is awesome!

docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
This is of course only an example. If you are interested in more complex manipulations (delete remote files, upload
weights on the fly, persist weights locally,...) please refer to the [upload files](./upload) guide.

### Limitations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move the Limitations section above the from_pretrained method. I think it'd be nice to have a full picture of the pros/cons of using this approach before actually trying to implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point but I think it could be confusing to users -at least with the text I wrote-. If I move the "limitations" section up, the reader would not really have in mind the method signatures/implementation that I already list everything that is missing from it.

What I can do is to add a sentence in the introduction to say "hey, we will see 2 different approaches, each of them have advantages/drawbacks so choose what's best for you". So that when the user reads the first approach he/she already have in mind that there are some limitations (without knowing which ones yet). What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! 👍

docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
docs/source/guides/integrations.mdx Outdated Show resolved Hide resolved
@Wauplin
Copy link
Contributor Author

Wauplin commented Mar 2, 2023

@stevhliu Thanks again for the detailed review! I've addressed all of your comments (see above about the limitations section). I think we are close to have a final version to merge :)

@Wauplin
Copy link
Contributor Author

Wauplin commented Mar 2, 2023

Merging this!

@Wauplin Wauplin merged commit 3a051bd into main Mar 2, 2023
@Wauplin Wauplin deleted the mixins-guide branch March 2, 2023 18:32
@osanseviero
Copy link
Contributor

The table is amazing! 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a guide of Pytorch Mixins
5 participants