InferenceClient
: handle error response in when streaming tokens from TGI
#1651
Labels
good first issue
Good for newcomers
Milestone
In
InferenceClient.text_generation
when using a TGI endpoint withstream=True
, it is possible that a serialized error is returned instead of a token (e.g.: out of memory error). Once an error is received, we should parse it correctly + raise a correct Python error. At the moment, a Pydantic error is raised because it cannot be validated as aTextGenerationStreamResponse
schema.Once an error is received, we can safely assume that no more tokens will be received in the stream.
The rust schema is:
The logic to parse a stream of tokens from TGI can be found here:
huggingface_hub/src/huggingface_hub/inference/_common.py
Line 263 in 89cc691
The text was updated successfully, but these errors were encountered: