Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve qwen2-vl startup #2802

Merged
merged 5 commits into from
Jan 17, 2025
Merged

feat: improve qwen2-vl startup #2802

merged 5 commits into from
Jan 17, 2025

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Dec 5, 2024

This PR resolves some small issues with qwen2-vl.

  1. doubles the size of WARMUP_IMAGE_BASE64 from 20x20px to 40x40px (meets qwens minimal requirement without hacky fix)
  2. removes hacky fix to double the warmup image
  3. prefer tokenizing each request instead of the whole batch at once. this change allows r.truncate to be passed for each request - as previouslly it was not respected when one of the request was smaller than others in the batch.
  4. sets max_s to the max of max_s or the input size. This is required so the rotary and create self._cos_cached of the correct size in relation to the position ids.

these changes resolve a startup issue reproducible with:

text-generation-launcher \
--model-id Qwen/Qwen2-VL-2B-Instruct \
--max-input-tokens 40 \
--max-batch-prefill-tokens 50 \
--max-total-tokens 51

*(note the underlying issue triggers when max-input-tokens is less than max-batch-prefill-tokens)

@Narsil
Copy link
Collaborator

Narsil commented Dec 9, 2024

  1. from 20x20px to 40x40px (meets qwens minimal requirement without hacky fix)

I do not understand, why should we impose anything on the user for the images. If 20px x 20x is not supported we should:

  • Rescale the image seemlessly and correctly infer on it.
  • Reject the image with a proper error message.
    User's shouldn't have to know anything about the model's internals, 20x20px should be ok imho.

@drbh drbh force-pushed the improve-qwen2-vl-warmup branch 2 times, most recently from 32a9564 to a3049f1 Compare December 9, 2024 21:32
@drbh drbh requested a review from Narsil December 16, 2024 15:54
@drbh drbh force-pushed the improve-qwen2-vl-warmup branch from a3049f1 to d671f6e Compare January 7, 2025 22:35
@drbh drbh changed the title feat: tokenize each request individually and increase warmup image size feat: improve qwen2-vl startup Jan 8, 2025
@drbh drbh force-pushed the improve-qwen2-vl-warmup branch from d671f6e to 320b520 Compare January 13, 2025 18:50
@drbh drbh force-pushed the improve-qwen2-vl-warmup branch from 35b528e to bd59f96 Compare January 16, 2025 16:04
@drbh
Copy link
Collaborator Author

drbh commented Jan 17, 2025

optimistically merging this PR as all tests pass, comments have been addressed, this image has been test/deployed in production and it fixes a bug when starting TGI with qwen2-vl.

Will watch for regressions and roll back if needed

@drbh drbh merged commit eecca27 into main Jan 17, 2025
14 checks passed
@drbh drbh deleted the improve-qwen2-vl-warmup branch January 17, 2025 16:50
drbh added a commit that referenced this pull request Jan 17, 2025
drbh added a commit that referenced this pull request Jan 17, 2025
Revert "feat: improve qwen2-vl startup  (#2802)"

This reverts commit eecca27.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants