-
Notifications
You must be signed in to change notification settings - Fork 7
Add docs #13
Add docs #13
Changes from 8 commits
8ab5972
8722860
be5003f
1eba75b
cd49eca
bc10894
e458d8b
42ca1d4
2fdfc42
88afadc
f7d41c7
157966c
e541f4e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
name: Build documentation | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
- doc-builder* | ||
- v*-release | ||
|
||
jobs: | ||
build: | ||
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main | ||
with: | ||
commit_sha: ${{ github.sha }} | ||
package: hffs | ||
secrets: | ||
token: ${{ secrets.HUGGINGFACE_PUSH }} | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
name: Build PR Documentation | ||
|
||
on: | ||
pull_request: | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
build: | ||
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main | ||
with: | ||
commit_sha: ${{ github.event.pull_request.head.sha }} | ||
pr_number: ${{ github.event.number }} | ||
package: hffs |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
name: Delete dev documentation | ||
|
||
on: | ||
pull_request: | ||
types: [ closed ] | ||
|
||
|
||
jobs: | ||
delete: | ||
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main | ||
with: | ||
pr_number: ${{ github.event.number }} | ||
package: hffs |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
name: Self-assign | ||
on: | ||
issue_comment: | ||
types: created | ||
jobs: | ||
one: | ||
runs-on: ubuntu-latest | ||
if: >- | ||
(github.event.comment.body == '#take' || | ||
github.event.comment.body == '#self-assign') | ||
&& !github.event.issue.assignee | ||
steps: | ||
- run: | | ||
echo "Assigning issue ${{ github.event.issue.number }} to ${{ github.event.comment.user.login }}" | ||
curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -d '{"assignees": ["${{ github.event.comment.user.login }}"]}' https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/assignees | ||
curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -X "DELETE" https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/labels/help%20wanted |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
<!--- | ||
Copyright 2020 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
|
||
# Generating the documentation | ||
|
||
To generate the documentation, you need to install our special tool that builds it: | ||
|
||
```bash | ||
pip install git+https://github.com/huggingface/doc-builder | ||
``` | ||
|
||
--- | ||
**NOTE** | ||
|
||
You only need to generate the documentation to inspect it locally (if you're planning changes and want to | ||
check how they look before committing for instance). You don't have to commit the built documentation. | ||
|
||
--- | ||
|
||
## Building the documentation | ||
|
||
Once you have setup the `doc-builder` and additional packages, you can generate the documentation by | ||
typing the following command: | ||
|
||
```bash | ||
doc-builder build hffs docs/source --build_dir ~/tmp/test-build | ||
``` | ||
|
||
You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate | ||
the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite | ||
Markdown editor. | ||
|
||
## Previewing the documentation | ||
|
||
To preview the docs, first install the `watchdog` module with: | ||
|
||
```bash | ||
pip install watchdog | ||
``` | ||
|
||
Then run the following command: | ||
|
||
```bash | ||
doc-builder preview {package_name} {path_to_docs} | ||
``` | ||
|
||
For example: | ||
|
||
```bash | ||
doc-builder preview hffs docs/source/ | ||
``` | ||
|
||
The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives. | ||
|
||
--- | ||
**NOTE** | ||
|
||
The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again). | ||
|
||
--- | ||
|
||
## Adding a new element to the navigation bar | ||
|
||
Accepted files are Markdown (.md or .mdx). | ||
|
||
Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting | ||
the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/hffs/blob/main/docs/source/_toctree.yml) file. | ||
|
||
## Adding an image | ||
|
||
Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like | ||
the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference | ||
them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images). | ||
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images | ||
to this dataset. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
- title: Get Started | ||
sections: | ||
- local: index | ||
title: 🤗 Filesystem | ||
- local: integration_zoo | ||
title: Integration Zoo |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Filesystem | ||
|
||
🤗 Filesystem is a package that provides a pythonic [fsspec-compatible](https://filesystem-spec.readthedocs.io/en/latest/) file interface to the [Hugging Face Hub](https://huggingface.co/). It builds on top of the [Hugging Face Hub client library](https://huggingface.co/docs/huggingface_hub/index) to read and write files and inspect repositories on the Hub. | ||
|
||
## Installation | ||
|
||
```bash | ||
pip install hffs | ||
``` | ||
|
||
## Usage | ||
|
||
`HfFileSystem` is the library's main class that holds connection information and enables typical filesystem style operations like `cp`, `mv`, `ls`, `du`, `glob`, `get_file`, `put_file` etc. | ||
|
||
```python | ||
>>> from hffs import HfFileSystem | ||
>>> fs = HfFileSystem("my-username/my-dataset-repo", repo_type="dataset", revision="main") | ||
|
||
>>> # List files in a directory | ||
>>> fs.ls("data") | ||
['train.csv', 'test.csv'] | ||
|
||
>>> # List all ".csv" files in a repo | ||
>>> fs.glob("**.csv") | ||
['data/train.csv', 'data/test.csv'] | ||
|
||
>>> # Read the contents of a remote file | ||
>>> with fs.open("data/train.csv", "r") as f: | ||
... train_data = f.readlines() | ||
|
||
>>> # Read all the contents of a remote file at once as a string | ||
>>> train_data = fs.read_text("data/train.csv") | ||
|
||
>>> # Write a remote file | ||
>>> with fs.open("data/validation.csv", "w") as f: | ||
... f.write("text,label") | ||
... f.write("Fantastic movie!,good") | ||
``` | ||
|
||
If not explicitly provided, the `repo_type` argument defaults to `model`. Besides `model`, other supported values are `dataset` and `space`. | ||
|
||
The optional `revision` argument can be passed to open a filesystem from a specific commit (any revision such as a branch or a tag name or a commit hash). | ||
mariosasko marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Unlike Python's built-in `open`, `fsspec`'s `open` defaults to binary mode, `"rb"`. This means you must explicitly set encoding as `"r"` for reading and `"w"` for writing in text mode. | ||
|
||
## Integration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it'd make more sense to move the Integration section before the Usage section? It might be good for the user to check if they can use a URL with an integration before they start using the filesystem operations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This order comes from the |
||
|
||
🤗 Filesystem can be used with any library that integrates `fsspec`, and the URL has the following structure: | ||
|
||
``` | ||
hf://[<repo_type>/]<repo_id>[@<revision>]:/<path/in/repo> | ||
``` | ||
|
||
The `repo_type` and `revision` parameters are optional. Most integrations also allow you to pass optional parameters to the filesystem's initializer as `storage_options`, a dictionary mapping parameter names to their values. | ||
|
||
Before passing a URL to an integration, you can use `fsspec.get_fs_token_paths` to ensure the URL initializes a filesystem as expected: | ||
|
||
```python | ||
>>> import fsspec | ||
|
||
>>> def get_filesystem_state_and_path_from_url(url): | ||
... fs, _, paths = fsspec.get_fs_token_paths(url) | ||
... return {"repo_type": fs.repo_type, "repo_id": fs.repo_id, "revision": fs.revision or "main", "path/in/repo": paths[0]} | ||
|
||
>>> print(get_filesystem_state_and_path_from_url("hf://dataset/johndoe/cityskapes:/images/image0.jpg")) | ||
{'repo_type': 'dataset', 'repo_id': 'johndoe/cityskapes', 'revision': 'main', 'path/in/repo': 'images/image0.jpg'} | ||
|
||
>>> print(get_filesystem_state_and_path_from_url("hf://johndoe/cityskapes@dev:/")) | ||
{'repo_type': 'model', 'repo_id': 'johndoe/cityskapes', 'revision': 'dev', 'path/in/repo': ''} | ||
``` | ||
|
||
## Authentication | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe also move this under Integration so users know beforehand they should be logged in with their Hugging Face account. So:
|
||
|
||
In many cases, you must be logged in with a Hugging Face account to interact with the Hub. Refer to the [Login](https://huggingface.co/docs/huggingface_hub/quick-start#login) section of the Hugging Face Hub client library documentation to learn about authentication methods on the Hub. | ||
|
||
It is also possible to login programmatically by passing your `token` as an argument to `HfFileSystem`. If you login this way, be careful not to accidentally leak the token when sharing your source code! | ||
lhoestq marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## API Reference | ||
|
||
As 🤗 Filesystem is based on [fsspec](https://filesystem-spec.readthedocs.io/en/latest/), it is compatible with most of the APIs that it offers. For more details, check out the fsspec's [API Reference](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem). | ||
|
||
|
||
Read the [Integration Zoo](integration_zoo) guide to learn more about libraries that integrate with `fsspec`, allowing convenient access to the Hub through 🤗 Filesystem. | ||
|
||
If you have questions about 🤗 Filesystem, feel free to join and ask the community on our [forum](https://discuss.huggingface.co/). | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Integration Zoo | ||
|
||
Below is a (non-exhaustive) list of usage examples showcasing 🤗 Filesystem's interesting integrations: | ||
|
||
* Reading/writing a [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-writing-remote-files) DataFrame from/to a 🤗 Hub repository: | ||
|
||
```python | ||
>>> import pandas as pd | ||
|
||
>>> # Read a remote CSV file into a dataframe | ||
>>> df = pd.read_csv("hf://datasets/my-username/my-dataset-repo:/train.csv") | ||
|
||
>>> # Write a dataframe to a remote CSV file | ||
>>> df.to_csv("hf://datasets/my-username/my-dataset-repo:/test.csv") | ||
``` | ||
|
||
The same workflow can also be used for [Dask](https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html) and [Polars](https://pola-rs.github.io/polars/py-polars/html/reference/io.html) DataFrames. | ||
|
||
* Querying (remote) 🤗 Hub files with [DuckDB](https://duckdb.org/docs/guides/python/filesystems): | ||
|
||
```python | ||
>>> from hffs import HfFileSystem | ||
>>> import duckdb | ||
|
||
>>> fs = HfFileSystem("my-username/my-dataset-repo", repo_type="dataset") | ||
>>> duckdb.register_filesystem(fs) | ||
>>> # Query a remote file and get the result back as a dataframe | ||
>>> df = duckdb.query("SELECT * FROM 'hf://data_dir/data.parquet' LIMIT 10").df() | ||
``` | ||
|
||
* Using 🤗 Hub as an array store with [Zarr](https://zarr.readthedocs.io/en/stable/tutorial.html#io-with-fsspec): | ||
|
||
```python | ||
>>> import numpy as np | ||
>>> import zarr | ||
|
||
>>> embeddings = np.random.randn(50000, 1000).astype("float32") | ||
|
||
>>> # Write an array to a repo | ||
>>> with zarr.open_group("hf://my-username/my-model-repo:/array-store", mode="w") as root: | ||
... foo = root.create_group("embeddings") | ||
... foobar = foo.zeros('experiment_0', shape=(50000, 1000), chunks=(10000, 1000), dtype='f4') | ||
... foobar[:] = embeddings | ||
|
||
>>> # Read an array from a repo | ||
>>> with zarr.open_group("hf://my-username/my-model-repo:/array-store", mode="r") as root: | ||
... first_row = root["embeddings/experiment_0"][0] | ||
``` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,4 +2,4 @@ | |
|
||
__version__ = "0.0.1.dev0" | ||
|
||
from .fs import HfFileSystem | ||
from .fs import HfFileSystem, HfFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LysandreJik Can you help me set up this secret?