Enhanced `InferenceClient` #1474

Wauplin · 2023-05-15T15:06:31Z

Goal:

The goal of this PR is to refacto the existing InferenceAPI client. The aim is to:

Support both InferenceAPI (free) and InferenceEndpoint (paid product) in a single client
Provide a nice UX inspired by the JS client and namely:
- 1 method per task. E.g. summary = client.summarization("this is a long text")
- list main parameters explicitly (see js example). E.g max_length
- parse output depending on the pipeline (text, image, audio,...)
- recommend 1 model per task if user don't know which one to use
Support custom requests for Inference Endpoints (through a generic .post() method)

Documentation:

Philosophy: "easy to use is better than been optimal or exhaustive"

Examples:

images parsed as PIL.Image by default
inputs can be provided as bytes, files, URLs,...
1 recommended model per task
only the main parameters are exposed (and let power users deal with more complex stuff)
timeout works for both "service unavailable" and proper timeout

# Image classification with the default model
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]

TODO (later)

update documentation
curate list of recommended models (+host it on the Hub)
how to handle asyncio ? asyncio is much more relevant if we target devs
validate inputs/outputs with pydantic ? => not a priority
deprecate InferenceAPI

HuggingFaceDocBuilderDev · 2023-05-15T15:10:57Z

The documentation is not available anymore as the PR was closed or merged.

osanseviero

Very cool! I played with it and it seems to be working nicely!

src/huggingface_hub/_inference.py

osanseviero · 2023-05-17T10:04:34Z

src/huggingface_hub/_inference.py

+
+    @staticmethod
+    def get_inference_api_url(model_id: str, task: str) -> str:
+        return f"{INFERENCE_ENDPOINT}/pipeline/{task}/{model_id}"


Note that the pipeline/{task} endpoint is not official as far as I know and we've in general not suggested to use it at least until few months ago. cc @Narsil I think this is fine but just so you know

The official endpoint was always https://api-inference.huggingface.co/models/ as we can identify the task automatically from it

Would like a confirmation on this to be sure but this is the URL we are already using in huggingface_hub and it is the case since the first PR (2y ago).

Yeah 🤔 I think we never officially communicated about this endpoint nor documented it as we pushed people to use the official one, but this one is more flexible and I think is ok to use here

Yes it's not officially documented as it causes confusion, and generally it's easier for everyone to keep 1 model - 1 task.

Biggest use case for this endpoint would be feature-extraction vs sentence-similarity (similarity is easier to understand in the widget form, but feature-extraction is the most useful one in a production setting where you're usually building a semantic database.

Thanks for the details @Narsil. I think I'll keep the endpoint url with this form (i.e. /task/...). In the end it's just a url built for internal use so I can make the method private. Most users will not even realize it if they use the correct method for their model.

And if a user tries to call a task on a model that doesn't support it, I'll gracefully handle the error to print to the user the available task(s) for their model.

Realized after some experiments that if I call a wrong model <> task pair, the pipeline is loaded by Inference API and then an HTTP 500 is returned. Not so efficient to catch client-side (and server-side). Since this use case is mostly useful for feature-extraction task, I think I'll hardcode this one and use the more officially promoted https://api-inference.huggingface.co/models/ URL in all other cases.

Wauplin · 2023-05-19T14:38:24Z

I started to add tasks but still wondering how I'll document/test/maintain this. Hopefully low maintenance once every task is implemented.

Here is a working example with current version. Audio and image in input can be provided as bytes, file path or url. In output, audio is converted to bytes while images are parsed as PIL.Image. The philosophy is "let's make it as easy as possible for the user and it it's too restrictive, let it use the generic client.post(...)".

>>> from huggingface_hub import InferenceClient
>>> from pathlib import Path
>>> client = InferenceClient()

# Text to speech to text
>>> audio = client.text_to_speech("Hello world")
b'fLaC\x00\x00...'
>>> Path("hello_world.wav").write_bytes(audio)
>>> client.audio_classification(audio)
[{'score': 0.4976358711719513, 'label': 'hap'}, {'score': 0.3677836060523987, 'label': 'neu'}, {'score': 0.1274358034133911, 'label': 'ang'}, {'score': 0.007144733797758818, 'label': 'sad'}]
>>> client.automatic_speech_recognition("hello_world.wav")
'LOW WHIRLD'

# Image classification
>>> client.image_classification("cat.jpg")
[{'score': 0.41461074352264404, 'label': 'tabby, tabby cat'}, ...]
>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]

# Image segmentation
>>> client.image_segmentation("cat.jpg"):
[{'score': 0.989008, 'label': 'LABEL_184', 'mask': <PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>}, ...]

# Text summarization
>>> client.summarization("The Eiffel tower...")
'The Eiffel tower is one of the most famous landmarks in the world....'

# Chat
>>> client.conversational("Hi, who are you?")
{'generated_text': 'I am the one who knocks.', 'conversation': {'generated_responses': ['I am the one who knocks.'], 'past_user_inputs': ['Hi, who are you?']}, 'warnings': ['Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.']}
>>> client.conversational(
...     "Wow, that's scary!",
...     generated_responses=output["conversation"]["generated_responses"],
...     past_user_inputs=output["conversation"]["past_user_inputs"],
... )

codecov · 2023-05-19T14:50:08Z

Codecov Report

Patch coverage has no change and project coverage change: -2.65 ⚠️

Comparison is base (b3e21ec) 82.44% compared to head (b79edc0) 79.79%.

❗ Current head b79edc0 differs from pull request most recent head 158a67a. Consider uploading reports for the commit 158a67a to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1474      +/-   ##
==========================================
- Coverage   82.44%   79.79%   -2.65%     
==========================================
  Files          53       55       +2     
  Lines        5707     5896     +189     
==========================================
  Hits         4705     4705              
- Misses       1002     1191     +189

Impacted Files	Coverage Δ
src/huggingface_hub/__init__.py	`75.75% <ø> (ø)`
src/huggingface_hub/_inference.py	`0.00% <0.00%> (ø)`
src/huggingface_hub/_inference_types.py	`0.00% <0.00%> (ø)`

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

osanseviero

Thanks a lot! I did a first pass

docs/source/guides/inference.mdx

src/huggingface_hub/_inference.py

osanseviero · 2023-05-26T17:04:47Z

src/huggingface_hub/_inference.py

+        )
+
+
+def _get_recommended_model(task: str) -> str:


Are we offering backwards compatible guarantees here? e.g. can we update recommended model? What's the user expectation:

Is it a stable model that will never change (as in transformers pipelines)

Or will it be the best model we curate for our users?

No I don't want to guarantee backward compatibility. I would rather mention in the docs + in the log message that we recommend passing explicitly the model instead. As I see it, we should aim to curate the best model for our users. The "best" model being one with good performance but also that we serve efficiently in Inference API. This is why I think it should be de-correlated from the transformers (and diffusers) pipelines.

Also I'm in favor of hosting a json file on the Hub to store the recommended models. This way we could update the list without making a release of huggingface_hub. The same list could also be shared with the JS client as well.
=> this is also why I'd rather not document the recommended model in the docstrings.

src/huggingface_hub/_inference_types.py

stevhliu

Very cool guide and comprehensive API documentation! 💯

Love it when you ping me to look at something because I always learn something new!

docs/source/guides/inference.mdx

src/huggingface_hub/_inference.py

src/huggingface_hub/_inference_types.py

Co-authored-by: Omar Sanseviero <[email protected]> Co-authored-by: Steven Liu <[email protected]>

Co-authored-by: Steven Liu <[email protected]> Co-authored-by: Omar Sanseviero <[email protected]>

Wauplin · 2023-05-30T15:35:03Z

@osanseviero @stevhliu Thanks a TON for the thorough review 🙏 I think I addressed all the points.
The main remaining question was about recommended models (see #1474 (comment)). The TL;DR: I'd rather not guarantee anything and will later default to have an hosted config file for them.

osanseviero

Looks good!

tests/test_inference_client.py

src/huggingface_hub/_inference_types.py

Co-authored-by: Omar Sanseviero <[email protected]>

Wauplin · 2023-05-31T14:00:04Z

I finally merged this one! :) And opened a followup issue to list the next steps: #1488

First draft for an enhanced InferenceClient

1fa1639

Wauplin changed the title ~~[Draft] Enhanced InferenceClient~~ [RFC] Enhanced InferenceClient May 15, 2023

osanseviero reviewed May 17, 2023

View reviewed changes

Wauplin added 5 commits May 17, 2023 17:07

Merge branch 'main' into enhanced-inference-client

c0796fb

Merge branch 'main' into enhanced-inference-client

3b65b3c

Added more tasks

ecfec27

docs

8d70590

handle image and audio from URLs as well

4d45b48

Wauplin mentioned this pull request May 19, 2023

feature extraction, sentence similarity and text classification now use pipeline url huggingface/huggingface.js#197

Closed

Wauplin added 18 commits May 22, 2023 14:12

open anything as binary

6acb14c

image-to-image + some refacto

e1d5c76

make InferenceClient experiemental for now

92a4d95

Handle timeouts

7b07de3

comments and docstrings

2a7de0f

docs

c52d915

fix docs

abf4198

fix docs?

8e3910e

stupid copy-paste

b79edc0

unity

2dcd24c

add text-to-image

158a67a

change

1ebef66

better

22a3ba8

better docs?

d6fdab2

remove experiemntal

4caffc5

not experimental at all

cd04737

implement feature-extraction

684f173

sentence similarity + docstrings

b5baa9f

Wauplin added 8 commits May 26, 2023 08:41

fix types

cbd3e4f

more links

7388abe

add a todo

6aaf3ed

Add VCR tests

8b22a70

more tests

67c69f5

No VCR recording in CI

1e11213

pip freeze in CI

3383597

use urllib3 v1.x in tests

c93d9b5

Wauplin requested review from osanseviero and LysandreJik May 26, 2023 09:23

osanseviero reviewed May 26, 2023

View reviewed changes

stevhliu approved these changes May 26, 2023

View reviewed changes

Merge branch 'main' into enhanced-inference-client

59ad549

Wauplin commented May 30, 2023

View reviewed changes

src/huggingface_hub/_inference_types.py Outdated Show resolved Hide resolved

Wauplin commented May 30, 2023

View reviewed changes

src/huggingface_hub/_inference_types.py Outdated Show resolved Hide resolved

Wauplin commented May 30, 2023

View reviewed changes

src/huggingface_hub/_inference_types.py Outdated Show resolved Hide resolved

Wauplin and others added 6 commits May 30, 2023 16:04

Apply suggestions from code review

e0f5498

Co-authored-by: Omar Sanseviero <[email protected]> Co-authored-by: Steven Liu <[email protected]>

Apply suggestions from code review

26f054d

Co-authored-by: Steven Liu <[email protected]> Co-authored-by: Omar Sanseviero <[email protected]>

requested changed

57c0611

add reco

b3651dd

Merge branch 'main' into enhanced-inference-client

aa28a0d

supported list as table instead

978fbf2

table header

a22ffb7

osanseviero approved these changes May 30, 2023

View reviewed changes

tests/test_inference_client.py Outdated Show resolved Hide resolved

src/huggingface_hub/_inference_types.py Show resolved Hide resolved

Update tests/test_inference_client.py

91fc7d6

Co-authored-by: Omar Sanseviero <[email protected]>

Wauplin merged commit cf74108 into main May 31, 2023

Wauplin deleted the enhanced-inference-client branch May 31, 2023 13:57

Wauplin mentioned this pull request May 31, 2023

InferenceClient: next steps #1488

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced `InferenceClient` #1474

Enhanced `InferenceClient` #1474

Wauplin commented May 15, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 15, 2023 •

edited

Loading

osanseviero left a comment

osanseviero May 17, 2023

Wauplin May 17, 2023

osanseviero May 17, 2023

Narsil May 17, 2023

Wauplin May 17, 2023

Wauplin May 22, 2023 •

edited

Loading

Wauplin commented May 19, 2023 •

edited

Loading

codecov bot commented May 19, 2023 •

edited

Loading

osanseviero left a comment

osanseviero May 26, 2023

Wauplin May 30, 2023

stevhliu left a comment

Wauplin commented May 30, 2023

osanseviero left a comment

Wauplin commented May 31, 2023

Enhanced InferenceClient #1474

Enhanced InferenceClient #1474

Conversation

Wauplin commented May 15, 2023 • edited Loading

HuggingFaceDocBuilderDev commented May 15, 2023 • edited Loading

osanseviero left a comment

Choose a reason for hiding this comment

osanseviero May 17, 2023

Choose a reason for hiding this comment

Wauplin May 17, 2023

Choose a reason for hiding this comment

osanseviero May 17, 2023

Choose a reason for hiding this comment

Narsil May 17, 2023

Choose a reason for hiding this comment

Wauplin May 17, 2023

Choose a reason for hiding this comment

Wauplin May 22, 2023 • edited Loading

Choose a reason for hiding this comment

Wauplin commented May 19, 2023 • edited Loading

codecov bot commented May 19, 2023 • edited Loading

Codecov Report

osanseviero left a comment

Choose a reason for hiding this comment

osanseviero May 26, 2023

Choose a reason for hiding this comment

Wauplin May 30, 2023

Choose a reason for hiding this comment

stevhliu left a comment

Choose a reason for hiding this comment

Wauplin commented May 30, 2023

osanseviero left a comment

Choose a reason for hiding this comment

Wauplin commented May 31, 2023

Enhanced `InferenceClient` #1474

Enhanced `InferenceClient` #1474

Wauplin commented May 15, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 15, 2023 •

edited

Loading

Wauplin May 22, 2023 •

edited

Loading

Wauplin commented May 19, 2023 •

edited

Loading

codecov bot commented May 19, 2023 •

edited

Loading