Merge pull request #546 from robertknight/docs-update

Update notes about data type and quantization support in the docs
robertknight · Jan 25, 2025 · fac90ba · fac90ba
2 parents 046e743 + f74c431
commit fac90ba
Show file tree

Hide file tree

Showing 3 changed files with 30 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -32,11 +32,12 @@ planned for the future:
 
 - Supports CPU inference only. There is currently no support for running models
   on GPUs or other accelerators.
-- Not all ONNX operators are currently supported. See `OperatorType` in
+- Not all ONNX operators are supported. See `OperatorType` in
   [src/schema.fbs](src/schema.fbs) and [this issue](https://github.com/robertknight/rten/issues/14) for currently supported operators. For
   implemented operators, some attributes or input shapes may not be supported.
-- A limited set of data types are supported: float32 and int32 tensors. int64
-  and boolean tensors are converted to int32.
+- Not all ONNX data types are supported. Currently supported data types for
+  tensors are: float32, int32, int64 (converted to int32), bool (converted to
+  int32), int8, uint8.
 - RTen is not as well optimized as more mature runtimes such as ONNX Runtime
   or TensorFlow Lite. The performance difference depends on the operators used,
   model structure, CPU architecture and platform.

diff --git a/docs/performance.md b/docs/performance.md
@@ -127,11 +127,6 @@ parallelism or other factors.
 
 ## Optimizing inference
 
-RTen does not currently have many turn-key solutions for optimizing inference,
-like `torch.compile` for PyTorch or ONNX Runtime's [graph
-optimizations](https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html).
-These are planned for the future.
-
 Some ways to speed up inference without changing RTen's code are:
 
 - If choosing from a family of models with different sizes, you can trade

diff --git a/src/lib.rs b/src/lib.rs
@@ -46,16 +46,25 @@
 //!
 //! RTen currently executes models on the CPU. It can build for most
 //! architectures that the Rust compiler supports. SIMD acceleration is
-//! available for x86-64, Arm Neon and WebAssembly. For x86-64, AVX-512 support
+//! available for x86-64, Arm 64 and WebAssembly. For x86-64, AVX-512 support
 //! is available but requires Nightly Rust and enabling the `avx512` crate
 //! feature.
 //!
 //! ## Data types
 //!
-//! RTen supports `f32` and `i32` data types. Models with `i64` and `bool`
-//! tensors are supported, but these are converted to `i32` by the conversion
-//! tool. Supported for lower-precision types (16-bit floats, 8-bit integers
-//! etc.) is planned for the future.
+//! RTen supports tensors with the following data types:
+//!
+//! - `f32`, `i32`, `i8`, `u8`
+//! - `i64` and `bool` tensors are supported by converting them to `i32` as
+//! part of the model conversion process. When preparing model inputs that
+//! expect these data types in ONNX, you will need to convert them to `i32`.
+//!
+//! Some operators support a more limited set of data types than described in
+//! the ONNX specification. Please file an issue if you need an operator to
+//! support additional data types.
+//!
+//! Support for additional types (eg. `f16`, `bf16`) is planned for the
+//! future.
 //!
 //! ## Operators
 //!
@@ -69,10 +78,21 @@
 //! - The `random` feature enables operators that generate random numbers (eg.
 //!   `RandomUniform`).
 //!
+//! ## Quantized models
+//!
+//! RTen supports quantized models where activations are in uint8 format and
+//! weights are in int8 format. This combination is the default when an ONNX
+//! model is quantized using [dynamic
+//! quantization](https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#dynamic-quantization).
+//! The `tools/ort-quantize.py` script in the RTen repository can be used to
+//! quantize an existing model with float tensors into this format.
+//!
 //! # Inspecting models
 //!
 //! The [rten-cli](https://crates.io/crates/rten-cli) tool can be used to query
-//! basic information about a `.rten` model.
+//! basic information about a `.rten` model, such as the inputs and outputs.
+//! It can also be used to test model compatibility and inference performance
+//! by running models with randomly generated inputs.
 //!
 //! # Performance
 //!