-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FastAPI v1/completions/ endpoint #12101
base: main
Are you sure you want to change the base?
Conversation
63b2b15
to
d09bf3c
Compare
temperature: float = 1.0 | ||
top_p: float = 0.0 | ||
top_k: int = 1 | ||
logProb: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be logprobs: int = 1
nemo/deploy/nlp/query_llm.py
Outdated
@@ -144,7 +144,7 @@ def query_llm( | |||
"choices": [{"text": sentences}], | |||
} | |||
if log_probs_output is not None: | |||
openai_response["log_probs"] = log_probs_output | |||
openai_response["logprobs"] = log_probs_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be zipped with choices and for each choice we should have:
{"logprobs": {"token_logprobs": [array_of_token_logprobs]}}
c694f2c
to
3502821
Compare
# limitations under the License. | ||
|
||
import os | ||
from pathlib import Path |
Check notice
Code scanning / CodeQL
Unused import
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 28 days ago
To fix the problem, we need to remove the unused import statement. This will clean up the code and remove the unnecessary dependency, making the code easier to read and maintain.
- Locate the import statement
from pathlib import Path
on line 12. - Remove this line from the file.
@@ -11,3 +11,2 @@ | ||
import os | ||
from pathlib import Path | ||
|
from nemo.collections.llm.deploy.base import chat_template | ||
|
||
# Load the template | ||
template = Template(chat_template) |
Check warning
Code scanning / CodeQL
Jinja2 templating with autoescape=False Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 25 days ago
To fix the problem, we need to ensure that the jinja2
environment used to render the template has autoescape
enabled. This can be done by using the select_autoescape
function when creating the Template
object. This function will automatically enable escaping for HTML and XML files, which are the most common targets for XSS attacks.
We will modify the code to create a jinja2
environment with autoescape
enabled and then use this environment to get the template and render it. This change will be made in the apply_chat_template
function.
-
Copy modified lines R136-R139 -
Copy modified line R142
@@ -135,5 +135,9 @@ | ||
from nemo.collections.llm.deploy.base import chat_template | ||
from jinja2 import Environment, select_autoescape | ||
|
||
# Create a jinja2 environment with autoescape enabled | ||
env = Environment(autoescape=select_autoescape(['html', 'xml'])) | ||
|
||
# Load the template | ||
template = Template(chat_template) | ||
template = env.from_string(chat_template) | ||
|
temperature: float = 1.0 | ||
top_p: float = 0.0 | ||
top_k: int = 1 | ||
logprobs: int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5433d31
to
1324431
Compare
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: athitten <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree <[email protected]>
7f3a900
to
40a0845
Compare
Signed-off-by: Abhishree <[email protected]>
46078aa
to
fca66a4
Compare
Signed-off-by: Abhishree <[email protected]>
ce1a81f
to
48d422b
Compare
Signed-off-by: Abhishree <[email protected]>
78aee07
to
436e676
Compare
Signed-off-by: Abhishree <[email protected]>
9b65e79
to
04b80eb
Compare
Signed-off-by: Abhishree <[email protected]>
9e9609c
to
c52f979
Compare
Signed-off-by: Abhishree <[email protected]>
3c3900f
to
d00593f
Compare
Signed-off-by: Abhishree <[email protected]>
ceb0839
to
6e715fe
Compare
Signed-off-by: Abhishree <[email protected]>
2d5ed5b
to
9e93a89
Compare
Signed-off-by: athitten <[email protected]>
prompts = request.prompt | ||
if not isinstance(request.prompt, list): | ||
prompts = [request.prompt] | ||
output = nq.query_llm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marta-sd I looked at it, it seems to me that the call stack of it will go to pytriton.ModelClient.infer_batch
instead of pytriton.AsyncioModelClient
and will block. See https://github.com/NVIDIA/NeMo/pull/12101/files#diff-f70646f35e4a50b01c01caf162262447c66f8f54e3b1a582e9da8ff080fc5b48R128-R129. ModelClient.infer_batch
is synchronous operation.
logging.info(f"Attempting to connect to Triton server at: {triton_url}") | ||
print("---triton_url---", triton_url) | ||
try: | ||
response = requests.get(triton_url, timeout=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
: this might get blocking too, it's recommended to use aihttp
instead of requests
inside async functions.
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Important
The
Update branch
button must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information