Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced InferenceClient #1474

Merged
merged 50 commits into from
May 31, 2023
Merged

Enhanced InferenceClient #1474

merged 50 commits into from
May 31, 2023

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented May 15, 2023

Goal:

The goal of this PR is to refacto the existing InferenceAPI client. The aim is to:

  1. Support both InferenceAPI (free) and InferenceEndpoint (paid product) in a single client
  2. Provide a nice UX inspired by the JS client and namely:
    • 1 method per task. E.g. summary = client.summarization("this is a long text")
    • list main parameters explicitly (see js example). E.g max_length
    • parse output depending on the pipeline (text, image, audio,...)
    • recommend 1 model per task if user don't know which one to use
  3. Support custom requests for Inference Endpoints (through a generic .post() method)

Documentation:


Philosophy: "easy to use is better than been optimal or exhaustive"

Examples:

  • images parsed as PIL.Image by default
  • inputs can be provided as bytes, files, URLs,...
  • 1 recommended model per task
  • only the main parameters are exposed (and let power users deal with more complex stuff)
  • timeout works for both "service unavailable" and proper timeout

# Image classification with the default model
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]

TODO (later)

  • update documentation
  • curate list of recommended models (+host it on the Hub)
  • how to handle asyncio ? asyncio is much more relevant if we target devs
  • validate inputs/outputs with pydantic ? => not a priority
  • deprecate InferenceAPI

@Wauplin Wauplin changed the title [Draft] Enhanced InferenceClient [RFC] Enhanced InferenceClient May 15, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 15, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! I played with it and it seems to be working nicely!

src/huggingface_hub/_inference.py Outdated Show resolved Hide resolved
src/huggingface_hub/_inference.py Show resolved Hide resolved
src/huggingface_hub/_inference.py Outdated Show resolved Hide resolved

@staticmethod
def get_inference_api_url(model_id: str, task: str) -> str:
return f"{INFERENCE_ENDPOINT}/pipeline/{task}/{model_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the pipeline/{task} endpoint is not official as far as I know and we've in general not suggested to use it at least until few months ago. cc @Narsil I think this is fine but just so you know

The official endpoint was always https://api-inference.huggingface.co/models/ as we can identify the task automatically from it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like a confirmation on this to be sure but this is the URL we are already using in huggingface_hub and it is the case since the first PR (2y ago).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah 🤔 I think we never officially communicated about this endpoint nor documented it as we pushed people to use the official one, but this one is more flexible and I think is ok to use here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's not officially documented as it causes confusion, and generally it's easier for everyone to keep 1 model - 1 task.

Biggest use case for this endpoint would be feature-extraction vs sentence-similarity (similarity is easier to understand in the widget form, but feature-extraction is the most useful one in a production setting where you're usually building a semantic database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details @Narsil. I think I'll keep the endpoint url with this form (i.e. /task/...). In the end it's just a url built for internal use so I can make the method private. Most users will not even realize it if they use the correct method for their model.

And if a user tries to call a task on a model that doesn't support it, I'll gracefully handle the error to print to the user the available task(s) for their model.

Copy link
Contributor Author

@Wauplin Wauplin May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized after some experiments that if I call a wrong model <> task pair, the pipeline is loaded by Inference API and then an HTTP 500 is returned. Not so efficient to catch client-side (and server-side). Since this use case is mostly useful for feature-extraction task, I think I'll hardcode this one and use the more officially promoted https://api-inference.huggingface.co/models/ URL in all other cases.

@Wauplin
Copy link
Contributor Author

Wauplin commented May 19, 2023

I started to add tasks but still wondering how I'll document/test/maintain this. Hopefully low maintenance once every task is implemented.

Here is a working example with current version. Audio and image in input can be provided as bytes, file path or url. In output, audio is converted to bytes while images are parsed as PIL.Image. The philosophy is "let's make it as easy as possible for the user and it it's too restrictive, let it use the generic client.post(...)".

>>> from huggingface_hub import InferenceClient
>>> from pathlib import Path
>>> client = InferenceClient()

# Text to speech to text
>>> audio = client.text_to_speech("Hello world")
b'fLaC\x00\x00...'
>>> Path("hello_world.wav").write_bytes(audio)
>>> client.audio_classification(audio)
[{'score': 0.4976358711719513, 'label': 'hap'}, {'score': 0.3677836060523987, 'label': 'neu'}, {'score': 0.1274358034133911, 'label': 'ang'}, {'score': 0.007144733797758818, 'label': 'sad'}]
>>> client.automatic_speech_recognition("hello_world.wav")
'LOW WHIRLD'

# Image classification
>>> client.image_classification("cat.jpg")
[{'score': 0.41461074352264404, 'label': 'tabby, tabby cat'}, ...]
>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]

# Image segmentation
>>> client.image_segmentation("cat.jpg"):
[{'score': 0.989008, 'label': 'LABEL_184', 'mask': <PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>}, ...]

# Text summarization
>>> client.summarization("The Eiffel tower...")
'The Eiffel tower is one of the most famous landmarks in the world....'

# Chat
>>> client.conversational("Hi, who are you?")
{'generated_text': 'I am the one who knocks.', 'conversation': {'generated_responses': ['I am the one who knocks.'], 'past_user_inputs': ['Hi, who are you?']}, 'warnings': ['Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.']}
>>> client.conversational(
...     "Wow, that's scary!",
...     generated_responses=output["conversation"]["generated_responses"],
...     past_user_inputs=output["conversation"]["past_user_inputs"],
... )

@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Patch coverage has no change and project coverage change: -2.65 ⚠️

Comparison is base (b3e21ec) 82.44% compared to head (b79edc0) 79.79%.

❗ Current head b79edc0 differs from pull request most recent head 158a67a. Consider uploading reports for the commit 158a67a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1474      +/-   ##
==========================================
- Coverage   82.44%   79.79%   -2.65%     
==========================================
  Files          53       55       +2     
  Lines        5707     5896     +189     
==========================================
  Hits         4705     4705              
- Misses       1002     1191     +189     
Impacted Files Coverage Δ
src/huggingface_hub/__init__.py 75.75% <ø> (ø)
src/huggingface_hub/_inference.py 0.00% <0.00%> (ø)
src/huggingface_hub/_inference_types.py 0.00% <0.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I did a first pass

docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
docs/source/guides/inference.mdx Show resolved Hide resolved
docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
src/huggingface_hub/_inference.py Outdated Show resolved Hide resolved
src/huggingface_hub/_inference.py Show resolved Hide resolved
src/huggingface_hub/_inference.py Outdated Show resolved Hide resolved
)


def _get_recommended_model(task: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we offering backwards compatible guarantees here? e.g. can we update recommended model? What's the user expectation:

  • Is it a stable model that will never change (as in transformers pipelines)
  • Or will it be the best model we curate for our users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't want to guarantee backward compatibility. I would rather mention in the docs + in the log message that we recommend passing explicitly the model instead. As I see it, we should aim to curate the best model for our users. The "best" model being one with good performance but also that we serve efficiently in Inference API. This is why I think it should be de-correlated from the transformers (and diffusers) pipelines.

Also I'm in favor of hosting a json file on the Hub to store the recommended models. This way we could update the list without making a release of huggingface_hub. The same list could also be shared with the JS client as well.
=> this is also why I'd rather not document the recommended model in the docstrings.

src/huggingface_hub/_inference_types.py Show resolved Hide resolved
Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool guide and comprehensive API documentation! 💯

Love it when you ping me to look at something because I always learn something new!

docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
docs/source/guides/inference.mdx Outdated Show resolved Hide resolved
src/huggingface_hub/_inference.py Outdated Show resolved Hide resolved
src/huggingface_hub/_inference.py Show resolved Hide resolved
src/huggingface_hub/_inference.py Outdated Show resolved Hide resolved
src/huggingface_hub/_inference.py Outdated Show resolved Hide resolved
src/huggingface_hub/_inference_types.py Outdated Show resolved Hide resolved
@Wauplin
Copy link
Contributor Author

Wauplin commented May 30, 2023

@osanseviero @stevhliu Thanks a TON for the thorough review 🙏 I think I addressed all the points.
The main remaining question was about recommended models (see #1474 (comment)). The TL;DR: I'd rather not guarantee anything and will later default to have an hosted config file for them.

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

tests/test_inference_client.py Outdated Show resolved Hide resolved
src/huggingface_hub/_inference_types.py Show resolved Hide resolved
@Wauplin Wauplin merged commit cf74108 into main May 31, 2023
@Wauplin Wauplin deleted the enhanced-inference-client branch May 31, 2023 13:57
@Wauplin Wauplin mentioned this pull request May 31, 2023
7 tasks
@Wauplin
Copy link
Contributor Author

Wauplin commented May 31, 2023

I finally merged this one! :) And opened a followup issue to list the next steps: #1488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants