Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InferenceClient: handle error response in when streaming tokens from TGI #1651

Closed
Wauplin opened this issue Sep 11, 2023 · 0 comments · Fixed by #1711
Closed

InferenceClient: handle error response in when streaming tokens from TGI #1651

Wauplin opened this issue Sep 11, 2023 · 0 comments · Fixed by #1711
Labels
good first issue Good for newcomers

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Sep 11, 2023

In InferenceClient.text_generation when using a TGI endpoint with stream=True, it is possible that a serialized error is returned instead of a token (e.g.: out of memory error). Once an error is received, we should parse it correctly + raise a correct Python error. At the moment, a Pydantic error is raised because it cannot be validated as a TextGenerationStreamResponse schema.
Once an error is received, we can safely assume that no more tokens will be received in the stream.

The rust schema is:

#[derive(Serialize, ToSchema)]
pub(crate) struct ErrorResponse {
    pub error: String,
    pub error_type: String,
}

The logic to parse a stream of tokens from TGI can be found here:

def _stream_text_generation_response(

@Wauplin Wauplin added this to the in next release? milestone Sep 11, 2023
@Wauplin Wauplin added the good first issue Good for newcomers label Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant