Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transpose operator is slow when source stride is a power of 2 #66

Closed
robertknight opened this issue Mar 26, 2024 · 1 comment · Fixed by #78
Closed

Transpose operator is slow when source stride is a power of 2 #66

robertknight opened this issue Mar 26, 2024 · 1 comment · Fixed by #78

Comments

@robertknight
Copy link
Owner

When doing some analysis into the performance of the Transpose operator, I noticed that performance is significantly worse when the copy done in contiguous_data leads to traversing the source tensor with a stride that is a power of 2 for most iterations.

Examples where this happens:

  1. Transpose a square matrix with size of side N being a power of 2
  2. Moving a dimension inwards, where all the dims to the right have power-of-2 sizes ([4, 1500, 8, 64] => [4, 8, 64, 1500])

To reproduce:

cargo test --release -p rten bench_transpose -- --nocapture --ignored

Note the "overhead" factors for 512x512, 1024x1024 matrices. Modify the benchmark to use non-power-of-2 sizes (eg. 1023x1023) and compare again. The overhead reported compared to just copying the data becomes much lower.

Power-of-2 dimension sizes are common in real models (eg. see examples used in bench_transpose), so this happens often.

The current transpose implementation is currently very naive. It creates a view of the input, permutes the strides and then iterates over the indices using a nested loop, copying source elements into a contiguous destination.

@robertknight
Copy link
Owner Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant