You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description #7567 introduced a reference counter to shared memory region to prevent users from releasing shared memory regions while any inference requests are still being processed.
I encountered an issue where the shared memory region could not be freed even when no inference request was ongoing.
Error I got:
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Cannot unregister shared memory region
'input_eec14889775b4c29a99d81458b0feb1a', it is currently in use
Triton Information
Triton Server 2.50.0 from NGC container 24.09
To Reproduce
Extract model_repository.zip into a directory named model_repository/. The directory should have the following layout after extraction:
The example uses an XGBoost model saved as xgboost_json format and loads it using the FIL backend.
Notably, the model is configured to use the CPU for inference (instance_group [{ kind: KIND_CPU }]).
Launch the Triton server using the latest NGC container:
The script will crash, producing a stack trace that looks like the following:
Create CUDA shared mem: name = input_660453d16483422bb3839a093360fd55
Create CUDA shared mem: name = output_012419c8da8443da9100011d565930cf
Free CUDA shared mem: name = input_660453d16483422bb3839a093360fd55
Traceback (most recent call last):
File "/workspace/test.py", line 216, in <module>
main(parsed_args)
File "/workspace/test.py", line 195, in main
infer(protocol=args.protocol, host=args.host, shared_mem="cuda")
File "/workspace/test.py", line 189, in infer
release_shared_memory(triton_client, inputs)
File "/workspace/test.py", line 152, in release_shared_memory
triton_client.unregister_cuda_shared_memory(name=io_.name)
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 1443, in unregister_cuda_shared_memory
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Cannot unregister shared memory region 'input_660453d16483422bb3839a093360fd55', it is currently in use.
If the script does not crash, run it a few more times.
Expected behavior
The test script should complete without crashing.
Some observations
The test script does not crash if the number of inference is reduced to under 100. On the other hand, increasing the number of inference to 1000-5000 reliably triggers the error.
The test script does not crash if the HTTP protocol is used instead of gRPC. (Replace argument --protocol grpc with --protocol http.)
The test script does not crash if the model is configured to use the GPU for inference (instance_group [{ kind: KIND_GPU }]).
The text was updated successfully, but these errors were encountered:
Description
#7567 introduced a reference counter to shared memory region to prevent users from releasing shared memory regions while any inference requests are still being processed.
I encountered an issue where the shared memory region could not be freed even when no inference request was ongoing.
Error I got:
Triton Information
Triton Server 2.50.0 from NGC container 24.09
To Reproduce
model_repository/
. The directory should have the following layout after extraction:The example uses an XGBoost model saved as
xgboost_json
format and loads it using the FIL backend.Notably, the model is configured to use the CPU for inference (
instance_group [{ kind: KIND_CPU }]
).test.py
with the following content:Click to see the content
The script will crash, producing a stack trace that looks like the following:
If the script does not crash, run it a few more times.
Expected behavior
The test script should complete without crashing.
Some observations
--protocol grpc
with--protocol http
.)instance_group [{ kind: KIND_GPU }]
).The text was updated successfully, but these errors were encountered: