huggingface · mariosasko · Feb 17, 2023 · Feb 17, 2023 · Feb 17, 2023 · Feb 17, 2023
diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml
@@ -0,0 +1,17 @@
+name: Build documentation
+
+on:
+  push:
+    branches:
+      - main
+      - doc-builder*
+      - v*-release
+
+jobs:
+   build:
+    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
+    with:
+      commit_sha: ${{ github.sha }}
+      package: hffs
+    secrets:
+      token: ${{ secrets.HUGGINGFACE_PUSH }}
diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml
@@ -0,0 +1,16 @@
+name: Build PR Documentation
+
+on:
+  pull_request:
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  build:
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
+    with:
+      commit_sha: ${{ github.event.pull_request.head.sha }}
+      pr_number: ${{ github.event.number }}
+      package: hffs
diff --git a/.github/workflows/delete_doc_comment.yml b/.github/workflows/delete_doc_comment.yml
@@ -0,0 +1,13 @@
+name: Delete dev documentation
+
+on:
+  pull_request:
+    types: [ closed ]
+
+
+jobs:
+  delete:
+    uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
+    with:
+      pr_number: ${{ github.event.number }}
+      package: hffs
diff --git a/.github/workflows/self-assign.yml b/.github/workflows/self-assign.yml
@@ -0,0 +1,16 @@
+name: Self-assign
+on:
+  issue_comment:
+    types: created
+jobs:
+  one:
+    runs-on: ubuntu-latest
+    if: >-
+      (github.event.comment.body == '#take' ||
+       github.event.comment.body == '#self-assign')
+      && !github.event.issue.assignee
+    steps:
+      - run: |
+          echo "Assigning issue ${{ github.event.issue.number }} to ${{ github.event.comment.user.login }}"
+          curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -d '{"assignees": ["${{ github.event.comment.user.login }}"]}' https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/assignees
+          curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -X "DELETE" https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/labels/help%20wanted
diff --git a/README.md b/README.md
@@ -1,5 +1,10 @@
 # `hffs`
 
+<a href="https://github.com/huggingface/hffs/actions/workflows/ci.yml?query=branch%3Amain"><img alt="Build" src="https://github.com/huggingface/hffs/actions/workflows/ci.yml/badge.svg?branch=main"></a>
+<a href="https://github.com/huggingface/hffs/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/hffs.svg"></a>
+<a href="https://github.com/huggingface/hffs"><img alt="Supported Python versions" src="https://img.shields.io/pypi/pyversions/hffs.svg"></a>
+<a href="https://huggingface.co/docs/hffs/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/hffs/index.svg?down_color=red&down_message=offline&up_message=online&label=doc"></a>
+
 `hffs` builds on [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) and [`fsspec`](https://github.com/fsspec/filesystem_spec) to provide a convenient Python filesystem interface to 🤗 Hub.
 
 ## Basic usage
@@ -56,87 +61,3 @@ The prefix for datasets is "datasets/", the prefix for spaces is "spaces/" and m
 ```bash
 pip install hffs
 ```
-
-## Usage examples
-
-* [`pandas`](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-writing-remote-files)/[`dask`](https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html)
-
-```python
->>> import pandas as pd
-
->>> # Read a remote CSV file into a dataframe
->>> df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv")
-
->>> # Write a dataframe to a remote CSV file
->>> df.to_csv("hf://datasets/my-username/my-dataset-repo/test.csv")
-```
-
-* [`datasets`](https://huggingface.co/docs/datasets/filesystems#load-and-save-your-datasets-using-your-cloud-storage-filesystem)
-
-```python
->>> import datasets
-
->>> # Export a (large) dataset to a repo
->>> output_dir = "hf://datasets/my-username/my-dataset-repo"
->>> builder = datasets.load_dataset_builder("path/to/local/loading_script/loading_script.py")
->>> builder.download_and_prepare(output_dir, file_format="parquet")
-
->>> # Stream the dataset from the repo
->>> dset = datasets.load_dataset("my-username/my-dataset-repo", split="train", streaming=True)
->>> # Process the examples
->>> for ex in dset:
-...    ...
-```
-
-* [`zarr`](https://zarr.readthedocs.io/en/stable/tutorial.html#io-with-fsspec)
-
-```python
->>> import numpy as np
->>> import zarr
-
->>> embeddings = np.random.randn(50000, 1000).astype("float32")
-
->>> # Write an array to a repo acting as a remote zarr store
->>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="w") as root:
-...    foo = root.create_group("embeddings")
-...    foobar = foo.zeros('experiment_0', shape=(50000, 1000), chunks=(10000, 1000), dtype='f4')
-...    foobar[:] = embeddings
-
->>> # Read from a remote zarr store
->>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="r") as root:
-...    first_row = root["embeddings/experiment_0"][0]
-```
-
-* [`duckdb`](https://duckdb.org/docs/guides/python/filesystems)
-
-```python
->>> import hffs
->>> import duckdb
-
->>> fs = hffs.HfFileSystem()
->>> duckdb.register_filesystem(fs)
->>> # Query a remote file and get the result as a dataframe
->>> df = duckdb.query("SELECT * FROM 'hf://datasets/my-username/my-dataset-repo/data.parquet' LIMIT 10").df()
-```
-
-## Authentication
-
-To write to your repotitories or access your private repositorories; you can login by running
-
-```bash
-huggingface-cli login
-```
-
-Or pass a token (from your [HF settings](https://huggingface.co/settings/tokens)) to
-
-```python
->>> import hffs
->>> fs = hffs.HfFileSystem(token=token)
-```
-
-or as `storage_options`:
-
-```python
->>> storage_options = {"token": token}
->>> df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv", storage_options=storage_options)
-```
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,88 @@
+<!---
+Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Generating the documentation
+
+To generate the documentation, you need to install our special tool that builds it:
+
+```bash
+pip install git+https://github.com/huggingface/doc-builder
+```
+
+---
+**NOTE**
+
+You only need to generate the documentation to inspect it locally (if you're planning changes and want to
+check how they look before committing for instance). You don't have to commit the built documentation.
+
+---
+
+## Building the documentation
+
+Once you have setup the `doc-builder` and additional packages, you can generate the documentation by 
+typing the following command:
+
+```bash
+doc-builder build hffs docs/source --build_dir ~/tmp/test-build
+```
+
+You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate
+the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite
+Markdown editor.
+
+## Previewing the documentation
+
+To preview the docs, first install the `watchdog` module with:
+
+```bash
+pip install watchdog
+```
+
+Then run the following command:
+
+```bash
+doc-builder preview {package_name} {path_to_docs}
+```
+
+For example:
+
+```bash
+doc-builder preview hffs docs/source/
+```
+
+The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
+
+---
+**NOTE**
+
+The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
+
+---
+
+## Adding a new element to the navigation bar
+
+Accepted files are Markdown (.md or .mdx).
+
+Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
+the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/hffs/blob/main/docs/source/_toctree.yml) file.
+
+## Adding an image
+
+Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
+the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
+them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
+If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
+to this dataset.
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -0,0 +1,6 @@
+- title: Get Started
+  sections:
+  - local: index
+    title: 🤗 Filesystem
+  - local: integration_zoo
+    title: Integration Zoo
diff --git a/docs/source/index.mdx b/docs/source/index.mdx
@@ -0,0 +1,87 @@
+# Filesystem
+
+🤗 Filesystem (`hffs`) is a package that provides a pythonic [fsspec-compatible](https://filesystem-spec.readthedocs.io/en/latest/) file interface to the [Hugging Face Hub](https://huggingface.co/). It builds on top of the [Hugging Face Hub client library](https://huggingface.co/docs/huggingface_hub/index) to read and write files and inspect repositories on the Hub.
+
+## Installation
+
+```bash
+pip install hffs
+```
+
+## Usage
+
+`HfFileSystem` is the library's main class that holds connection information and enables typical filesystem style operations like `cp`, `mv`, `ls`, `du`, `glob`, `get_file`, `put_file` etc.
+
+```python
+>>> from hffs import HfFileSystem
+>>> fs = HfFileSystem()
+
+>>> # List files in a directory
+>>> fs.ls("datasets/my-username/my-dataset-repo/data", detail=False)
+['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv']
+
+>>> # List all ".csv" files in a repo
+>>> fs.glob("datasets/my-username/my-dataset-repo/**.csv")
+['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv']
+
+>>> # Read the contents of a remote file 
+>>> with fs.open("datasets/my-username/my-dataset-repo/data/train.csv", "r") as f:
+...     train_data = f.readlines()
+
+>>> # Read all the contents of a remote file at once as a string
+>>> train_data = fs.read_text("datasets/my-username/my-dataset-repo/data/train.csv")
+
+>>> # Write a remote file
+>>> with fs.open("datasets/my-username/my-dataset-repo/data/validation.csv", "w") as f:
+...     f.write("text,label")
+...     f.write("Fantastic movie!,good")
+```
+
+The prefix for datasets is "datasets/", the prefix for spaces is "spaces/" and models don't need a prefix in the URL.
+
+The optional `revision` argument can be passed to open a filesystem from a specific commit (any revision such as a branch or a tag name or a commit hash).
+
+Unlike Python's built-in `open`, `fsspec`'s `open` defaults to binary mode, `"rb"`. This means you must explicitly set encoding as `"r"` for reading and `"w"` for writing in text mode.
+
+## Integration
+
+🤗 Filesystem can be used with any library that integrates `fsspec`, and the URL has the following structure:
+
+```
+hf://[<repo_type_prefix>]<repo_id>/<path/in/repo>
+```
+
+The `revision` parameter is optional. Most integrations also allow you to pass optional parameters to the filesystem's initializer as `storage_options`, a dictionary mapping parameter names to their values:
+
+```python
+>>> storage_options = {"revision": "main"}
+```
+
+## Authentication
+
+In many cases, you must be logged in with a Hugging Face account to interact with the Hub:
+
+```bash
+huggingface-cli login
+```
+
+Refer to the [Login](https://huggingface.co/docs/huggingface_hub/quick-start#login) section of the Hugging Face Hub client library documentation to learn more about authentication methods on the Hub. 
+
+It is also possible to login programmatically by passing your `token` as an argument to `HfFileSystem`:
+
+```python
+>>> import hffs
+>>> fs = hffs.HfFileSystem(token=token)
+```
+
+If you login this way, be careful not to accidentally leak the token when sharing your source code!
+
+## API Reference
+
+As 🤗 Filesystem is based on [fsspec](https://filesystem-spec.readthedocs.io/en/latest/), it is compatible with most of the APIs that it offers. For more details, check out the fsspec's [API Reference](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem).
+
+
+Read the [Integration Zoo](integration_zoo) guide to learn more about libraries that integrate with `fsspec`, allowing convenient access to the Hub through 🤗 Filesystem.
+
+If you have questions about 🤗 Filesystem, feel free to join and ask the community on our [forum](https://discuss.huggingface.co/).
+