Replies: 2 comments
-
This discussion on batch processing may provide some guidance. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What should I do to enable multiple users to ask questions to the language model simultaneously and receive responses? Does llama.cpp support parallel inference for concurrent operations?
How can we ensure that requests made to the language model are processed and inferred in parallel, rather than sequentially, to serve multiple users simultaneously?
Beta Was this translation helpful? Give feedback.
All reactions