Releases: huggingface/huggingface_hub
v0.4.0: Tag listing, Namespace Objects, Model Filter
Tag listing
- Introduce Tag Listing by @muellerzr in #537
This PR introduces the ability to fetch all available tags for models or datasets and returns them as a nested namespace object, for example:
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> tags = api.get_model_tags()
>>> print(tags)
Available Attributes:
* benchmark
* language_creators
* languages
* licenses
* multilinguality
* size_categories
* task_categories
* task_ids
>>> print(tags.benchmark)
Available Attributes:
* raft
* superb
* test
Namespace objects
- Namespace Objects for Search Parameters by @muellerzr in #556
With a goal of adding more tab-completion to the library, this PR introduces two objects:
DatasetSearchArguments
ModelSearchArguments
These two AttributeDictionary
objects contain all the valid information we can extract from a model as tab-complete parameters. We also include the author_or_organization
and dataset
(or model
) _name
as well through careful string splitting.
Model Filter
- Implement a Model Filter class by @muellerzr in #553
This PR introduces a new way to search the hub: the ModelFilter
class.
It is a simple Enum at first to the user, allowing them to specify what they want to search for, such as:
f = ModelFilter(author="microsoft", model_name="wavlm-base-sd", framework="pytorch")
From there, they can pass in this filter to the new list_models_by_filter
function in HfApi
to search through it:
models = api.list_modes(filter=f)
The API may then be used for complex queries:
args = ModelSearchArguments()
f = ModelFilter(framework=[args.library.pytorch, args.library.TensorFlow], model_name="bert", tasks=[args.pipeline_tag.Summarization, args.pipeline_tag.TokenClassification])
api.list_models_from_filter(f)
Ignoring filenames in snapshot_download
This PR introduces a way to limit the files that will be fetched by the snapshot_download
. This is useful when you want to download and cache an entire repository without using git, and that you want to skip files according to their filenames.
- [Snapshot download] allow some filenames to be ignored by @patrickvonplaten in #566
What's Changed
- [Hotfix][API] card_data => cardData on /api/datasets by @julien-c in #530
- Fix the progress bars when cloning a repository by @LysandreJik in #517
- Update Hugging Face Hub documentation README and Endpoints by @muellerzr in #527
- Convert string functions to f-string by @muellerzr in #536
- Fixing FS for
espnet
. by @Narsil in #542 - [snapshot_download] upgrade to canonical separator by @julien-c in #545
- Add test directions by @muellerzr in #547
- [HOTFIX] Change test for missing_input to reflect back-end redirect changes by @muellerzr in #552
- Bring consistency to download and upload APIs by @muellerzr in #574
- Search by authors and string by @FrancescoSaverioZuppichini in #531
- Quick typo by @muellerzr in #575
New Contributors
- @kahne made their first contribution in #569
- @FrancescoSaverioZuppichini made their first contribution in #531
Full Changelog: v0.2.1...v0.4.0
v0.2.1: Patch release
This is a patch release fixing an issue with the notebook login.
5e2da9b#diff-fb1696cbcf008dd89dde5e8c1da9d4be5a8f7d809bc32f07d4453caba40df15f
v0.2.0: Access tokens, skip large files, local files only
Access tokens
Version v0.2.0 introduces the access token compatibility with the hub. It offers the access tokens as the main login handler, with the possibility to still login with username/password when doing [Ctrl/CMD]+C on the login prompt:
The notebook login is adapted to work with the access tokens.
Skipping large files
The Repository
class now has an additional parameter, skip_lfs_files
, which allows cloning the repository while skipping the large file download.
Local files only for snapshot_download
The snapshot_download
method can now take local_files_only
as a parameter to enable leveraging previously downloaded files.
v0.1.2: Patch release
What's Changed
- clean_ok should be True by default by @LysandreJik in #462
Full Changelog: v0.1.1...v0.1.2
v0.1.1: Patch release
What's Changed
- Fix typing-extensions minimum version by @lhoestq in #453
- Fix argument order in
create_repo
forRepository.clone_from
by @sgugger in #459
Full Changelog: v0.1.0...v0.1.1
v0.1.0: Optional token, `HfApi` begone, git prune
What's Changed
Version v0.1.0 is the first minor release of the huggingface_hub
package, which promises better stability for the incoming versions. This update comes with big quality of life improvements.
Make token optional in all HfApi methods. by @sgugger in #379
Previously, most methods of the HfApi
class required the token to be explicitly passed. This is changed in this version, where it defaults to the token stored in the cache. This results in a re-ordering of arguments, but backward compatibility is preserved in most cases. Where it is not preserved, an explicit error is thrown.
Root methods instead of HfApi
by @LysandreJik in #388
The HfApi
class now exposes its methods through the hf_api
file, reducing the friction to access these helpers. See the example below:
# Previously
from huggingface_hub import HfApi
api = HfApi()
user = api.whoami()
# Now
from huggingface_hub.hf_api import whoami
user = whoami()
The HfApi
can still be imported and works as before for backward compatibility.
Add list_repo_files
util by @sgugger in #395
Offers a list_repo_files
to ... list the repo files! Supports both model repositories and dataset repositories
Add helper to generate an eval result model-index
, with proper typing by @julien-c in #382
Offers a metadata_eval_result
in order to generate a YAML block to put in model cards according to evaluation results.
Add metrics to API by @mariosasko in #429
Adds a list_metrics method to HfApi!
Git prune by @LysandreJik in #450
Adds a git_prune
method to the Repository
class. This prunes local files which are unneeded as already pushed to a remote.
It adds the argument auto_lfs_prune
to git_push
and the commit
context-manager for simpler handling.
Bug fixes
- Fix HfApi.create_repo when repo_type is 'space' by @nateraw in #394
- Last fixes for
datasets
'push_to_hub
method by @LysandreJik in #415
Full Changelog: v0.0.19...v0.1.0
v0.0.18: Repo metadata, git tags, Keras mixin
v0.0.18: Repo metadata, git tags, Keras mixin
Repository metadata (@julien-c)
The version v0.0.18 of the huggingface_hub
includes tools to manage repository metadata. The following example reads metadata from a repository:
from huggingface_hub import Repository
repo = Repository("xxx", clone_from="yyy")
data = repo.repocard_metadata_load()
The following example completes that metadata before writing it to the repository locally.
data["license"] = "apache-2.0"
repo.repocard_metadata_save(data)
Git tags (@AngledLuffa)
Tag management is now available! Add, check, delete tags locally or remotely directly from the Repository
utility.
- Tags #323 (@AngledLuffa)
Revisited Keras support (@nateraw)
The Keras mixin has been revisited:
- It now saves models as
SavedModel
objects rather than.h5
files. - It now offers methods that can be leveraged simply as a functional API, instead of having to use the Mixin as an actual mixin.
Improvements and bug fixes
v0.0.17: Non-blocking git push, notebook login
v0.0.17: Non-blocking git push, notebook login
Non-blocking git-push
The pushing methods now have access to a blocking
boolean parameter to indicate whether the push should happen
asynchronously.
In order to see if the push has finished or its status code (to spot a failure), one should use the command_queue
property on the Repository
object.
For example:
from huggingface_hub import Repository
repo = Repository("<local_folder>", clone_from="<user>/<model_name>")
with repo.commit("Commit message", blocking=False):
# Save data
last_command = repo.command_queue[-1]
# Status of the push command
last_command.status
# Will return the status code
# -> -1 will indicate the push is still ongoing
# -> 0 will indicate the push has completed successfully
# -> non-zero code indicates the error code if there was an error
# if there was an error, the stderr may be inspected
last_command.stderr
# Whether the command finished or if it is still ongoing
last_command.is_done
# Whether the command errored-out.
last_command.failed
When using blocking=False
, the commands will be tracked and your script will exit only when all pushes are done, even
if other errors happen in your script (a failed push counts as done).
- Non blocking git push #315 (@LysandreJik)
Notebook login (@sgugger)
The huggingface_hub
library now has a notebook_login
method which can be used to login on notebooks with no access to the shell. In a notebook, login with the following:
from huggingface_hub import notebook_login
notebook_login()
Improvements and bugfixes
- added option to create private repo #319 (@philschmid)
- display git push warnings #326 (@elishowk)
- Allow specifying data with the Inference API wrapper #271 (@osanseviero)
- Add auth to snapshot download #340 (@lewtun)
v0.0.16: Progress bars, git credentials
v0.0.16: Progress bars, git credentials
The huggingface_hub
version v0.0.16 introduces several quality of life improvements.
Progress bars in Repository
Progress bars are now visible with many git operations, such as pulling, cloning and pushing:
>>> from huggingface_hub import Repository
>>> repo = Repository("local_folder", clone_from="huggingface/CodeBERTa-small-v1")
Cloning https://huggingface.co/huggingface/CodeBERTa-small-v1 into local empty directory.
Download file pytorch_model.bin: 45%|████████████████████████████▋ | 144M/321M [00:13<00:12, 14.7MB/s]
Download file flax_model.msgpack: 42%|██████████████████████████▌ | 134M/319M [00:13<00:13, 14.4MB/s]
Branching support
There is now branching support in Repository
. This will clone the xxx
repository and checkout the new-branch
revision. If it is an existing branch on the remote, it will checkout that branch. If it is another revision, such as a commit or a tag, it will also checkout that revision.
If the revision does not exist, it will create a branch from the latest commit on the main
branch.
>>> from huggingface_hub import Repository
>>> repo = Repository("local", clone_from="xxx", revision="new-branch")
Once the repository is instantiated, it is possible to manually checkout revisions using the git_checkout
method. If the revision already exists:
>>> repo.git_checkout("main")
If a branch should be created from the current head in the case that it does not exist:
>>> repo.git_checkout("brand-new-branch", create_branch_ok=True)
Revision `brand-new-branch` does not exist. Created and checked out branch `brand-new-branch`
Finally, the commit
context manager has a new branch
parameter to specify to which branch the utility should push:
>>> with repo.commit("New commit on branch brand-new-branch", branch="brand-new-branch"):
... # Save any file or model here, it will be committed to that branch.
... torch.save(model.state_dict())
Git credentials
The login system has been redesigned to leverage git-credential
instead of a token-based authentication system. It leverages the git-credential store
helper. If you're unaware of what this is, you may see the following when logging in with huggingface_hub
:
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
Username:
Password:
Login successful
Your token has been saved to /root/.huggingface/token
Authenticated through git-crendential store but this isn't the helper defined on your machine.
You will have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal to set it as the default
git config --global credential.helper store
Running the command git config --global credential.helper store
will set this as the default way to handle credentials for git authentication. All repositories instantiated with the Repository
utility will have this helper set by default, so no action is required from your part when leveraging it.
Improved logging
The logging system is now similar to the existing logging system in transformers
and datasets
, based on a logging
module that controls the entire library's logging level:
>>> from huggingface_hub import logging
>>> logging.set_verbosity_error()
>>> logging.set_verbosity_info()
Bug fixes and improvements
- Add documentation to GitHub and the Hub docs about the Inference client wrapper #253 (@osanseviero)
- Have large files enabled by default when using
Repository
#219 (@LysandreJik) - Clarify/specify/document model card metadata,
model-index
, and pipeline/task types #265 (@julien-c) - [model_card][metadata] Actually, lets make dataset.name required #267 (@julien-c)
- Progress bars #261 (@LysandreJik)
- Add keras mixin #230 (@nateraw)
- Open source code related to the repo type (tag icon, display order, snippets) #273 (@osanseviero)
- Branch push to hub #276 (@LysandreJik)
- Git credentials #277 (@LysandreJik)
- Push to hub/commit with branches #282 (@LysandreJik)
- Better logging #262 (@LysandreJik)
- Remove custom language pack behavior #291 (@LysandreJik)
- Update Hub and huggingface_hub docs #293 (@osanseviero)
- Adding a handler #292 (@LysandreJik)
v0.0.15
v0.0.15: Documentation, bug fixes and misc improvements
Improvements and bugfixes
- [Docs] Update link to Gradio documentation #206 (@abidlabs)
- Fix title typo (Cliet -> Client) #207 (@cakiki)
- add _from_pretrained hook #159 (@nateraw)
- Add
filename
option tolfs_track
#212 (@LysandreJik) - Repository fixes #213 (@LysandreJik)
- Repository documentation #214 (@LysandreJik)
- Add datasets filtering and sorting #194 (@lhoestq)
- doc: sync github to spaces #221 (@borisdayma)
- added batch transform documentation & model archive documentation #224 (@philschmid)
- Sync with hf internal #228 (@mishig25)
- Adding batching support for superb #215 (@Narsil)
- Adding SD for superb (speech-classification). #225 (@Narsil)
- Use Hugging Face fork for s3prl #229 (@lewtun)
- Mv
interfaces
->widgets/lib/interfaces
#227 (@mishig25) - Tweak to prevent accidental sharing of token #226 (@julien-c)
- Fix CLI-based repo creation #234 (@osanseviero)
- Add proxify util function #235 (@mishig25)