Skip to content

Latest commit

 

History

History
83 lines (62 loc) · 3.54 KB

File metadata and controls

83 lines (62 loc) · 3.54 KB

ONNX -> TensorRT INT8

These scripts were last tested using the NGC TensorRT Container Version 20.06-py3. You can see the corresponding framework versions for this container here.

Quickstart

NOTE: This INT8 example is only valid for fixed-shape ONNX models at the moment.

INT8 Calibration on dynamic-shape models is now supported, however this example has not been updated to reflect that yet. For more details on INT8 Calibration for dynamic-shape models, please see the documentation.

1. Convert ONNX model to TensorRT INT8

See ./onnx_to_tensorrt.py -h for full list of command line arguments.

./onnx_to_tensorrt.py --explicit-batch \
                      --onnx resnet50/model.onnx \
                      --fp16 \
                      --int8 \
                      --calibration-cache="caches/yolov6.cache" \
                      -o resnet50.int8.engine

See the INT8 Calibration section below for details on calibration using your own model or different data, where you don't have an existing calibration cache or want to create a new one.

INT8 Calibration

See Calibrator.py for a reference implementation of TensorRT's IInt8EntropyCalibrator2.

This class can be tweaked to work for other kinds of models, inputs, etc.

In the Quickstart section above, we made use of a pre-existing cache, caches/yolov6.cache, to save time for the sake of an example.

However, to calibrate using different data or a different model, you can do so with the --calibration-data argument.

  • This requires that you've mounted a dataset, such as Imagenet, to use for calibration.
    • Add something like -v /imagenet:/imagenet to your Docker command in Step (1) to mount a dataset found locally at /imagenet.
  • You can specify your own preprocess_func by defining it inside of Calibrator.py
# Path to dataset to use for calibration.
#   **Not necessary if you already have a calibration cache from a previous run.
CALIBRATION_DATA="/imagenet"

# Truncate calibration images to a random sample of this amount if more are found.
#   **Not necessary if you already have a calibration cache from a previous run.
MAX_CALIBRATION_SIZE=512

# Calibration cache to be used instead of calibration data if it already exists,
# or the cache will be created from the calibration data if it doesn't exist.
CACHE_FILENAME="caches/yolov6.cache"

# Path to ONNX model
ONNX_MODEL="model/yolov6.onnx"

# Path to write TensorRT engine to
OUTPUT="yolov6.int8.engine"

# Creates an int8 engine from your ONNX model, creating ${CACHE_FILENAME} based
# on your ${CALIBRATION_DATA}, unless ${CACHE_FILENAME} already exists, then
# it will use simply use that instead.
python3 onnx_to_tensorrt.py --fp16 --int8 -v \
        --max_calibration_size=${MAX_CALIBRATION_SIZE} \
        --calibration-data=${CALIBRATION_DATA} \
        --calibration-cache=${CACHE_FILENAME} \
        --preprocess_func=${PREPROCESS_FUNC} \
        --explicit-batch \
        --onnx ${ONNX_MODEL} -o ${OUTPUT}

Pre-processing

In order to calibrate your model correctly, you should pre-process your data the same way that you would during inference.