Skip to content

Commit

Permalink
Merge pull request #546 from robertknight/docs-update
Browse files Browse the repository at this point in the history
Update notes about data type and quantization support in the docs
  • Loading branch information
robertknight authored Jan 25, 2025
2 parents 046e743 + f74c431 commit fac90ba
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 14 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,12 @@ planned for the future:

- Supports CPU inference only. There is currently no support for running models
on GPUs or other accelerators.
- Not all ONNX operators are currently supported. See `OperatorType` in
- Not all ONNX operators are supported. See `OperatorType` in
[src/schema.fbs](src/schema.fbs) and [this issue](https://github.com/robertknight/rten/issues/14) for currently supported operators. For
implemented operators, some attributes or input shapes may not be supported.
- A limited set of data types are supported: float32 and int32 tensors. int64
and boolean tensors are converted to int32.
- Not all ONNX data types are supported. Currently supported data types for
tensors are: float32, int32, int64 (converted to int32), bool (converted to
int32), int8, uint8.
- RTen is not as well optimized as more mature runtimes such as ONNX Runtime
or TensorFlow Lite. The performance difference depends on the operators used,
model structure, CPU architecture and platform.
Expand Down
5 changes: 0 additions & 5 deletions docs/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,11 +127,6 @@ parallelism or other factors.

## Optimizing inference

RTen does not currently have many turn-key solutions for optimizing inference,
like `torch.compile` for PyTorch or ONNX Runtime's [graph
optimizations](https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html).
These are planned for the future.

Some ways to speed up inference without changing RTen's code are:

- If choosing from a family of models with different sizes, you can trade
Expand Down
32 changes: 26 additions & 6 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,25 @@
//!
//! RTen currently executes models on the CPU. It can build for most
//! architectures that the Rust compiler supports. SIMD acceleration is
//! available for x86-64, Arm Neon and WebAssembly. For x86-64, AVX-512 support
//! available for x86-64, Arm 64 and WebAssembly. For x86-64, AVX-512 support
//! is available but requires Nightly Rust and enabling the `avx512` crate
//! feature.
//!
//! ## Data types
//!
//! RTen supports `f32` and `i32` data types. Models with `i64` and `bool`
//! tensors are supported, but these are converted to `i32` by the conversion
//! tool. Supported for lower-precision types (16-bit floats, 8-bit integers
//! etc.) is planned for the future.
//! RTen supports tensors with the following data types:
//!
//! - `f32`, `i32`, `i8`, `u8`
//! - `i64` and `bool` tensors are supported by converting them to `i32` as
//! part of the model conversion process. When preparing model inputs that
//! expect these data types in ONNX, you will need to convert them to `i32`.
//!
//! Some operators support a more limited set of data types than described in
//! the ONNX specification. Please file an issue if you need an operator to
//! support additional data types.
//!
//! Support for additional types (eg. `f16`, `bf16`) is planned for the
//! future.
//!
//! ## Operators
//!
Expand All @@ -69,10 +78,21 @@
//! - The `random` feature enables operators that generate random numbers (eg.
//! `RandomUniform`).
//!
//! ## Quantized models
//!
//! RTen supports quantized models where activations are in uint8 format and
//! weights are in int8 format. This combination is the default when an ONNX
//! model is quantized using [dynamic
//! quantization](https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#dynamic-quantization).
//! The `tools/ort-quantize.py` script in the RTen repository can be used to
//! quantize an existing model with float tensors into this format.
//!
//! # Inspecting models
//!
//! The [rten-cli](https://crates.io/crates/rten-cli) tool can be used to query
//! basic information about a `.rten` model.
//! basic information about a `.rten` model, such as the inputs and outputs.
//! It can also be used to test model compatibility and inference performance
//! by running models with randomly generated inputs.
//!
//! # Performance
//!
Expand Down

0 comments on commit fac90ba

Please sign in to comment.