-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced InferenceClient
#1474
Enhanced InferenceClient
#1474
Conversation
InferenceClient
InferenceClient
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I played with it and it seems to be working nicely!
src/huggingface_hub/_inference.py
Outdated
|
||
@staticmethod | ||
def get_inference_api_url(model_id: str, task: str) -> str: | ||
return f"{INFERENCE_ENDPOINT}/pipeline/{task}/{model_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the pipeline/{task}
endpoint is not official as far as I know and we've in general not suggested to use it at least until few months ago. cc @Narsil I think this is fine but just so you know
The official endpoint was always https://api-inference.huggingface.co/models/
as we can identify the task automatically from it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like a confirmation on this to be sure but this is the URL we are already using in huggingface_hub
and it is the case since the first PR (2y ago).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah 🤔 I think we never officially communicated about this endpoint nor documented it as we pushed people to use the official one, but this one is more flexible and I think is ok to use here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's not officially documented as it causes confusion, and generally it's easier for everyone to keep 1 model - 1 task.
Biggest use case for this endpoint would be feature-extraction
vs sentence-similarity
(similarity is easier to understand in the widget form, but feature-extraction is the most useful one in a production setting where you're usually building a semantic database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details @Narsil. I think I'll keep the endpoint url with this form (i.e. /task/...). In the end it's just a url built for internal use so I can make the method private. Most users will not even realize it if they use the correct method for their model.
And if a user tries to call a task on a model that doesn't support it, I'll gracefully handle the error to print to the user the available task(s) for their model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized after some experiments that if I call a wrong model <> task pair, the pipeline is loaded by Inference API and then an HTTP 500 is returned. Not so efficient to catch client-side (and server-side). Since this use case is mostly useful for feature-extraction task, I think I'll hardcode this one and use the more officially promoted https://api-inference.huggingface.co/models/
URL in all other cases.
I started to add tasks but still wondering how I'll document/test/maintain this. Hopefully low maintenance once every task is implemented. Here is a working example with current version. Audio and image in input can be provided as bytes, file path or url. In output, audio is converted to bytes while images are parsed as >>> from huggingface_hub import InferenceClient
>>> from pathlib import Path
>>> client = InferenceClient()
# Text to speech to text
>>> audio = client.text_to_speech("Hello world")
b'fLaC\x00\x00...'
>>> Path("hello_world.wav").write_bytes(audio)
>>> client.audio_classification(audio)
[{'score': 0.4976358711719513, 'label': 'hap'}, {'score': 0.3677836060523987, 'label': 'neu'}, {'score': 0.1274358034133911, 'label': 'ang'}, {'score': 0.007144733797758818, 'label': 'sad'}]
>>> client.automatic_speech_recognition("hello_world.wav")
'LOW WHIRLD'
# Image classification
>>> client.image_classification("cat.jpg")
[{'score': 0.41461074352264404, 'label': 'tabby, tabby cat'}, ...]
>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]
# Image segmentation
>>> client.image_segmentation("cat.jpg"):
[{'score': 0.989008, 'label': 'LABEL_184', 'mask': <PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>}, ...]
# Text summarization
>>> client.summarization("The Eiffel tower...")
'The Eiffel tower is one of the most famous landmarks in the world....'
# Chat
>>> client.conversational("Hi, who are you?")
{'generated_text': 'I am the one who knocks.', 'conversation': {'generated_responses': ['I am the one who knocks.'], 'past_user_inputs': ['Hi, who are you?']}, 'warnings': ['Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.']}
>>> client.conversational(
... "Wow, that's scary!",
... generated_responses=output["conversation"]["generated_responses"],
... past_user_inputs=output["conversation"]["past_user_inputs"],
... ) |
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #1474 +/- ##
==========================================
- Coverage 82.44% 79.79% -2.65%
==========================================
Files 53 55 +2
Lines 5707 5896 +189
==========================================
Hits 4705 4705
- Misses 1002 1191 +189
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! I did a first pass
) | ||
|
||
|
||
def _get_recommended_model(task: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we offering backwards compatible guarantees here? e.g. can we update recommended model? What's the user expectation:
- Is it a stable model that will never change (as in
transformers
pipelines) - Or will it be the best model we curate for our users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't want to guarantee backward compatibility. I would rather mention in the docs + in the log message that we recommend passing explicitly the model instead. As I see it, we should aim to curate the best model for our users. The "best" model being one with good performance but also that we serve efficiently in Inference API. This is why I think it should be de-correlated from the transformers
(and diffusers
) pipelines.
Also I'm in favor of hosting a json file on the Hub to store the recommended models. This way we could update the list without making a release of huggingface_hub
. The same list could also be shared with the JS client as well.
=> this is also why I'd rather not document the recommended model in the docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool guide and comprehensive API documentation! 💯
Love it when you ping me to look at something because I always learn something new!
Co-authored-by: Omar Sanseviero <[email protected]> Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]> Co-authored-by: Omar Sanseviero <[email protected]>
@osanseviero @stevhliu Thanks a TON for the thorough review 🙏 I think I addressed all the points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Co-authored-by: Omar Sanseviero <[email protected]>
I finally merged this one! :) And opened a followup issue to list the next steps: #1488 |
Goal:
The goal of this PR is to refacto the existing InferenceAPI client. The aim is to:
summary = client.summarization("this is a long text")
max_length
.post()
method)Documentation:
Philosophy: "easy to use is better than been optimal or exhaustive"
Examples:
PIL.Image
by defaulttimeout
works for both "service unavailable" and proper timeoutTODO (later)
InferenceAPI