-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: huggingface/text-generation-inference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
CUDA Out of memory when using the benchmarking tool with batch size greater than 1
#2952
opened Jan 24, 2025 by
mborisov-bi
3 of 4 tasks
Serverless Inference API OpenAI /v1/chat/completions route broken
#2946
opened Jan 23, 2025 by
pelikhan
1 of 4 tasks
RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model
#2944
opened Jan 23, 2025 by
edesalve
2 of 4 tasks
text-generation-inference:3.0.1 docker container timeout on image fetching from fastapi static files.
#2930
opened Jan 21, 2025 by
dinoelT
2 of 4 tasks
Mangled generation for string sequences containing
<space>'m
with Llama 3.1
#2927
opened Jan 20, 2025 by
tomjorquera
1 of 4 tasks
AttributeError: no attribute 'model' when using llava-next with lora-adapters
#2926
opened Jan 20, 2025 by
derkleinejakob
2 of 4 tasks
Does tgi support image resize for qwen2-vl pipeline?
#2920
opened Jan 16, 2025 by
AHEADer
1 of 4 tasks
CUDA: an illegal memory access was encountered with Mistral FP8 Marlin kernels on NVIDIA driver 535.216.01 (AWS Sagemaker Real-time Inference)
#2915
opened Jan 15, 2025 by
dwyatte
3 of 4 tasks
Slow when using response format with JSON schemas with 8+ optional properties
#2902
opened Jan 11, 2025 by
TwirreM
2 of 4 tasks
Support
reponse_format: {"type": "json_object"}
without any constrained schema
#2899
opened Jan 10, 2025 by
lhoestq
Automatic Calculation of Sequence Length in TGI v3 Leads to Unrealistic Values Before CUDA OOM
#2897
opened Jan 10, 2025 by
biba10
2 of 4 tasks
Prefill operation can be significantly slower in TGI v3 vs TGI v2
#2896
opened Jan 10, 2025 by
biba10
2 of 4 tasks
[Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8] Bad Responses with High Concurrent Requests
#2894
opened Jan 9, 2025 by
michaelact
4 tasks done
make install-server does not have Apple MacOS Metal Framework
#2890
opened Jan 8, 2025 by
qdrddr
2 of 4 tasks
summarization using fine-tuned flan-t5 model in TGI outputs "generated text" instead of "summary_text" and outputs are completely different
#2889
opened Jan 7, 2025 by
maiiabocharova
2 of 4 tasks
Qwen2-VL failed to infer multiple images (Server error: upper bound and larger bound inconsistent with step sign)
#2888
opened Jan 7, 2025 by
AHEADer
2 of 4 tasks
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.