-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new flag for GPU peer access API control #7261
Conversation
src/command_line_parser.cc
Outdated
@@ -373,7 +373,8 @@ enum TritonOptionId { | |||
OPTION_BACKEND_CONFIG, | |||
OPTION_HOST_POLICY, | |||
OPTION_MODEL_LOAD_GPU_LIMIT, | |||
OPTION_MODEL_NAMESPACING | |||
OPTION_MODEL_NAMESPACING, | |||
OPTION_PEER_ACCESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question - do we gate options based on enable gpu compile flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we don't gate for GPU right now, although we do it for other build options (e.g. tracing, http, etc).
…-server/server into ibhosale_pinned_mem_fix
@@ -777,6 +777,7 @@ SERVER_ARGS="--allow-sagemaker=true --model-control-mode=explicit \ | |||
--load-model=simple --load-model=ensemble_add_sub_int32_int32_int32 \ | |||
--load-model=repeat_int32 \ | |||
--load-model=input_all_required \ | |||
--load-model=dynamic_batch \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were we missing this model before? Is it correctly added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a breakage cause by one of my changes in L0_trace.
Tested the pipeline for this change fixes currently failing L0_trace
Co-authored-by: Iman Tabrizian <[email protected]>
Added a new flag "--enable-peer-access" at Triton startup to control creation of CUDA context at server startup.
Jira : https://jirasw.nvidia.com/browse/DLIS-6705
Core : triton-inference-server/core#361