Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --n_iters flag to CLI and perform constant propagation before running model #202

Merged
merged 3 commits into from
May 21, 2024

Conversation

robertknight
Copy link
Owner

  • The n_iters flag is useful for assessing the initial run vs "warmed up" inference times for models
  • The constant propagation step is useful to highlight where models have unnecessary operators in them that could be eliminated by a graph optimization (ORT does this automatically)

This is useful when running model inference in a loop, with a fixed set of
options that have been passed as a parameter to the function running the loop.
This runs the model repeatedly. This is useful for assessing how much
performance varies on the initial run vs after warmup is completed.
When testing `decoder_model.onnx` from
https://huggingface.co/Mozilla/distilvit/tree/main/onnx I found that it was very
slow in RTen compared to ONNX Runtime (>100ms vs 20ms). The main culprit turned
out to be an expensive `Transpose` operator with constant inputs. ORT does
constant propagation as part of its automatic optimizations when the model is
loaded. RTen doesn't have an optimization like this implemented, but by
implementing it "manually" in the CLI, we can see how much benefit this would
provide.
@robertknight robertknight merged commit 2c24dbb into main May 21, 2024
2 checks passed
@robertknight robertknight deleted the rten-cli-multiple-iters branch May 21, 2024 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant