-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark activations using different implementations. #1156
Comments
Just a note I'll be bringing this up at the monthly meeting to see if TF core team has a standard process for this. As previously mentioned calling |
Just a note I'll be bringing this up at the monthly meeting to see if TF core team has a standard process for this.
Is this possible to join this meeting? At least as observer?
|
Of course! Makes me think we're not properly advertisting this meeting.. its open to all though you need to be subscribed to the mailing list in order to join. |
|
TensorFlow-core has some benchmarking as well (e.g. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/data/benchmarks/list_files_benchmark.py). Maybe we can build a similar module to provide the benchmarking utilities for the operations/kernels/functions in tf-addons. |
How about we test the 2nd order gradients of activations while we're at it? Refers #1099 |
Looks like we're block by googlecolab/colabtools#1034 |
I ran some numbers and here are the results: https://colab.research.google.com/drive/1rLb4EuydbFg9PbhboXhCDqopcl6BmphG I avoided the data copy by just fetching a single scalar at the end. It still forced the computation on the whole tensor. Those plots were made using colab, if you run the notebook locally, you'll have different results and please post them as they might be interesting. |
Closes as we have changed to pure python ops. |
See #1137 (comment)
We now have multiple implementations for activations on CPU and GPU, It might be worth benchmarking them because it's not obvious which ones are faster. On CPU, for big tensors, pure python in eager mode seems to be faster than custom C++ ops.
See https://colab.research.google.com/drive/1LTx3vMpA1fLCESKl-_WrLIp0Fq1zYZL0
for benchmarking on cpu (when running the notebook, make sure the gpu is not available in colab).
For GPU and CPU, 4 implementations should be tested:
We should also test for big and small tensors (the best would be to have plots, with 4 curves, x axis being the number of elements in the input tensor, y axis being the speed (number of elements processed by sec). Don't fortget to use
.numpy()
to force the execution (ops might get executed lazily).To obtain the result of a %timeit, use the
-o
flag:If everything could be delivered in a pretty notebook on colab with the link in this issue (no pull request), it would be surper duper awesome 😃
The text was updated successfully, but these errors were encountered: