Benchmark activations using different implementations. #1156

gabrieldemarmiesse · 2020-02-26T16:06:34Z

We now have multiple implementations for activations on CPU and GPU, It might be worth benchmarking them because it's not obvious which ones are faster. On CPU, for big tensors, pure python in eager mode seems to be faster than custom C++ ops.

See https://colab.research.google.com/drive/1LTx3vMpA1fLCESKl-_WrLIp0Fq1zYZL0
for benchmarking on cpu (when running the notebook, make sure the gpu is not available in colab).

For GPU and CPU, 4 implementations should be tested:

Custom kernel (C++ or CUDA)
Pure python in eager mode
Pure python with tf.function
Pure python with XLA

We should also test for big and small tensors (the best would be to have plots, with 4 curves, x axis being the number of elements in the input tensor, y axis being the speed (number of elements processed by sec). Don't fortget to use .numpy() to force the execution (ops might get executed lazily).

To obtain the result of a %timeit, use the -o flag:

timeit_object_with_results = %timeit -o my_func(my_tensor)

If everything could be delivered in a pretty notebook on colab with the link in this issue (no pull request), it would be surper duper awesome 😃

The text was updated successfully, but these errors were encountered:

seanpmorgan · 2020-02-26T16:09:29Z

Just a note I'll be bringing this up at the monthly meeting to see if TF core team has a standard process for this. As previously mentioned calling .numpy() will transfer results from GPU to CPU so we need a way to test without that. See:
https://colab.research.google.com/drive/12ende9xXMSywP2lOKWrFJBwaDDHbkYXh

gabrieldemarmiesse · 2020-02-26T16:45:13Z

Just a note I'll be bringing this up at the monthly meeting to see if TF core team has a standard process for this.

Is this possible to join this meeting? At least as observer?

seanpmorgan · 2020-02-26T16:53:54Z

Is this possible to join this meeting? At least as observer?

Of course! Makes me think we're not properly advertisting this meeting.. its open to all though you need to be subscribed to the mailing list in order to join.
https://github.com/tensorflow/addons#community
https://groups.google.com/a/tensorflow.org/forum/#!forum/addons
https://docs.google.com/document/d/1kxg5xIHWLY7EMdOJCdSGgaPu27a9YKpupUz2VTXqTJg/edit

failure-to-thrive · 2020-02-26T18:13:06Z

~~Keep in mind, the power of the GPU execution - batches. Iterating over 1000 items will definitely be slower. Those 1000 should be submitted at once.~~
OK, now I see (100, 100, 1000).

feihugis · 2020-02-26T18:53:15Z

TensorFlow-core has some benchmarking as well (e.g. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/data/benchmarks/list_files_benchmark.py). Maybe we can build a similar module to provide the benchmarking utilities for the operations/kernels/functions in tf-addons.

abhichou4 · 2020-02-27T17:58:00Z

How about we test the 2nd order gradients of activations while we're at it? Refers #1099
Again, no pull requests. But it might help in figuring out which ones require custom gradients.

gabrieldemarmiesse · 2020-03-01T20:05:58Z

Looks like we're block by googlecolab/colabtools#1034

gabrieldemarmiesse · 2020-03-03T15:49:25Z

I ran some numbers and here are the results: https://colab.research.google.com/drive/1rLb4EuydbFg9PbhboXhCDqopcl6BmphG

I avoided the data copy by just fetching a single scalar at the end. It still forced the computation on the whole tensor.

Those plots were made using colab, if you run the notebook locally, you'll have different results and please post them as they might be interesting.

WindQAQ · 2020-12-17T02:21:23Z

Closes as we have changed to pure python ops.

gabrieldemarmiesse mentioned this issue Feb 26, 2020

Added an option to use the pure python implementation. #1137

Merged

seanpmorgan added benchmarking custom-ops help wanted Needs help as a contribution labels Feb 27, 2020

gabrieldemarmiesse added the blocked Pending something elses completion label Mar 1, 2020

seanpmorgan mentioned this issue Mar 3, 2020

Test wheels in a fresh environement #1189

Merged

gabrieldemarmiesse removed the blocked Pending something elses completion label Mar 3, 2020

gabrieldemarmiesse mentioned this issue Mar 18, 2020

Improved LAMB optimizer. #1334

Closed

WindQAQ closed this as completed Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark activations using different implementations. #1156

Benchmark activations using different implementations. #1156

gabrieldemarmiesse commented Feb 26, 2020 •

edited

Loading

seanpmorgan commented Feb 26, 2020

gabrieldemarmiesse commented Feb 26, 2020 via email •

edited

Loading

seanpmorgan commented Feb 26, 2020

failure-to-thrive commented Feb 26, 2020 •

edited

Loading

feihugis commented Feb 26, 2020

abhichou4 commented Feb 27, 2020

gabrieldemarmiesse commented Mar 1, 2020

gabrieldemarmiesse commented Mar 3, 2020

WindQAQ commented Dec 17, 2020

Benchmark activations using different implementations. #1156

Benchmark activations using different implementations. #1156

Comments

gabrieldemarmiesse commented Feb 26, 2020 • edited Loading

seanpmorgan commented Feb 26, 2020

gabrieldemarmiesse commented Feb 26, 2020 via email • edited Loading

seanpmorgan commented Feb 26, 2020

failure-to-thrive commented Feb 26, 2020 • edited Loading

feihugis commented Feb 26, 2020

abhichou4 commented Feb 27, 2020

gabrieldemarmiesse commented Mar 1, 2020

gabrieldemarmiesse commented Mar 3, 2020

WindQAQ commented Dec 17, 2020

gabrieldemarmiesse commented Feb 26, 2020 •

edited

Loading

gabrieldemarmiesse commented Feb 26, 2020 via email •

edited

Loading

failure-to-thrive commented Feb 26, 2020 •

edited

Loading