Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] Unable to Change DLA Local DRAM when using torch_tensorrt.compile #2731

Closed
bionictoucan opened this issue Apr 5, 2024 · 5 comments · Fixed by #2749
Closed

🐛 [Bug] Unable to Change DLA Local DRAM when using torch_tensorrt.compile #2731

bionictoucan opened this issue Apr 5, 2024 · 5 comments · Fixed by #2749
Assignees
Labels
bug Something isn't working

Comments

@bionictoucan
Copy link

Bug Description

Hello, I am currently compiling my model to TensorRT on a Jetson AGX Orin dev. kit. As such I'd like to make use of the DLAs on the system. By default the local DRAM is set to 1024MiB and I'm looking to increase this due to the size of some of the layers in my network. The network is a simple U-Net but the first layer consists of feature maps of dimension [32, 64, 592, 784] so quite big and requiring more DRAM to be able to execute these layers on the DLA. In torch_tensorrt.compile, I set the kwarg dla_local_dram_size to be a different number e.g. 2 times the default value, however when I run the script to make the engine, the DLA local ram is still the default value of 1024MiB.

To Reproduce

Steps to reproduce the behavior:

import torch
import torch_tensorrt
from unet import UNet #custom unet model, but inherits from nn.Module so could be replaced with any torchvision model

model = UNet(3, 64, 1).eval().half()
inputs = [torch_tensorrt.Input([32, 3, 592, 784], dtype=torch.half)]
enabled_precisions = {torch.half}
device = torch_tensorrt.Device("dla:0", allow_gpu_fallback=True)

trt_ts_model = torch_tensorrt.compile(model, inputs=inputs, enabled_precisions=enabled_precisions, device=device, dla_local_dram_size=2*1024**3)

Expected behavior

Expected to build the TensorRT engine with local DRAM of 2048MiB but instead get local DRAM of 1024MiB.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 1.4.0
  • PyTorch Version (e.g. 1.0): 2.0.0
  • CPU Architecture: aarch64
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives: yes, torch_tensorrt is built from source
  • Python version: 3.8.10
  • CUDA version: 11.4
  • GPU models and configuration: NVIDIA Jetson AGX Orin Developer Kit
  • Any other relevant information:
@bionictoucan bionictoucan added the bug Something isn't working label Apr 5, 2024
@gs-olive
Copy link
Collaborator

Hi - thanks for the report - to collect some more information, if you try specifying other custom DLA parameters such as dla_global_dram_size and dla_sram_size, do these changes correctly update the corresponding parameters on your machine?

@bionictoucan
Copy link
Author

Both dla_global_dram_size and dla_sram_size keep their default values too when specifying new values. I am sticking with assigning values that are powers of 2 as per the documentation so there shouldn't be an issue there.

@gs-olive
Copy link
Collaborator

Thanks for testing this out - I am looking into the issue and will follow up with any updates

@gs-olive
Copy link
Collaborator

gs-olive commented Apr 12, 2024

I added a fix in #2749 which is now reflected on the main branch. As a quick workaround to port the fix into your build if you prefer to use your current version of Torch-TRT is to add in lines 141-143 from here: https://github.com/pytorch/TensorRT/pull/2749/files, in the same file within your build, which can be found at the following path:

spec = {
"inputs": inputs,
"input_signature": input_signature,
"device": device,
"disable_tf32": disable_tf32, # Force FP32 layers to use traditional as FP32 format
"sparse_weights": sparse_weights, # Enable sparsity for convolution and fully connected layers.
"enabled_precisions": enabled_precisions, # Enabling FP16 kernels
"refit": refit, # enable refit
"debug": debug, # enable debuggable engine
"capability": capability, # Restrict kernel selection to safe gpu kernels or safe dla kernels
"num_avg_timing_iters": num_avg_timing_iters, # Number of averaging timing iterations used to select kernels
"workspace_size": workspace_size, # Maximum size of workspace given to TensorRT
"calibrator": calibrator,
"truncate_long_and_double": truncate_long_and_double,
"torch_fallback": {
"enabled": not require_full_compilation,
"forced_fallback_ops": torch_executed_ops,
"forced_fallback_modules": torch_executed_modules,
"min_block_size": min_block_size,
},
"allow_shape_tensors": allow_shape_tensors,
}

@bionictoucan
Copy link
Author

Can confirm the above fix works on my setup. Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants