Image eats up way too many tokens #2923

aymeric-roucher · 2025-01-17T16:03:32Z

System Info

Using Inference Endpoint here: https://endpoints.huggingface.co/m-ric/endpoints/qwen2-72b-instruct-psj
ghcr.io/huggingface/text-generation-inference:3.0.1

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Here's what I'm trying to run:

import base64
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()

client = OpenAI(
    base_url="https://lmqbs8965pj40e01.us-east-1.aws.endpoints.huggingface.cloud/v1",
    api_key=os.getenv("HF_TOKEN")
)

with open('./screenshot.png', 'rb') as img_file:
    base64_image = base64.b64encode(img_file.read()).decode('utf-8')


client.chat.completions.create(
    model="a",
    messages=[
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's on this screenshot?"},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_image}"}
            }
        ]
    }
])

The image is not big, here it is:

I get this errror:

huggingface_hub.errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://lmqbs8965pj40e01.us-east-1.aws.endpoints.huggingface.cloud/v1/chat/completions (Request ID: 9kQ8on)

Input validation error: `inputs` tokens + `max_new_tokens` must be <= 32768. Given: 96721 `inputs` tokens and 0 `max_new_tokens`

It seems like my image was converted to a very large image, when original it is roughly only 1000*1000 pixels.

Expected behavior

I'd expect the uploaded image to be <1k tokens instead of ~100k tokens.

Other APIs (OpenAI, Anthropic) handle the same image fine, so I'm wondering: do they do some image size reduction pre-processing? Or is this a bug on TGI side?

The text was updated successfully, but these errors were encountered:

sanbindal1990 · 2025-01-19T20:59:01Z

I am also facing similar issue and it reads to me that TGI validation logic is counting tokens incorrect in case of inline images: https://github.com/huggingface/text-generation-inference/blob/main/integration-tests/conftest.py#L668

Please let me know if there is any work around to this? The images which I am passing are leading to 100k+ tokens

hanouticelina mentioned this issue Jan 20, 2025

Input validation error: inputs tokens + max_new_tokens must be <= 4096 in Qwen2-VL-7B-Instruct huggingface/huggingface_hub#2763

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image eats up way too many tokens #2923

Image eats up way too many tokens #2923

aymeric-roucher commented Jan 17, 2025

sanbindal1990 commented Jan 19, 2025

Image eats up way too many tokens #2923

Image eats up way too many tokens #2923

Comments

aymeric-roucher commented Jan 17, 2025

System Info

Information

Tasks

Reproduction

Expected behavior

sanbindal1990 commented Jan 19, 2025