`InferenceClient`: handle error response in when streaming tokens from TGI #1651

Wauplin · 2023-09-11T07:28:03Z

In InferenceClient.text_generation when using a TGI endpoint with stream=True, it is possible that a serialized error is returned instead of a token (e.g.: out of memory error). Once an error is received, we should parse it correctly + raise a correct Python error. At the moment, a Pydantic error is raised because it cannot be validated as a TextGenerationStreamResponse schema.
Once an error is received, we can safely assume that no more tokens will be received in the stream.

The rust schema is:

#[derive(Serialize, ToSchema)]
pub(crate) struct ErrorResponse {
    pub error: String,
    pub error_type: String,
}

The logic to parse a stream of tokens from TGI can be found here:

huggingface_hub/src/huggingface_hub/inference/_common.py

Line 263 in 89cc691

def _stream_text_generation_response(

The text was updated successfully, but these errors were encountered:

Wauplin added this to the in next release? milestone Sep 11, 2023

Wauplin added the good first issue Good for newcomers label Sep 29, 2023

Wauplin mentioned this issue Oct 5, 2023

Handle TGI error when streaming tokens #1711

Merged

Wauplin closed this as completed in #1711 Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`InferenceClient`: handle error response in when streaming tokens from TGI #1651

`InferenceClient`: handle error response in when streaming tokens from TGI #1651

Wauplin commented Sep 11, 2023 •

edited

Loading

InferenceClient: handle error response in when streaming tokens from TGI #1651

InferenceClient: handle error response in when streaming tokens from TGI #1651

Comments

Wauplin commented Sep 11, 2023 • edited Loading

`InferenceClient`: handle error response in when streaming tokens from TGI #1651

`InferenceClient`: handle error response in when streaming tokens from TGI #1651

Wauplin commented Sep 11, 2023 •

edited

Loading