Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Several micro optimizations #4833

Merged
merged 3 commits into from
Dec 2, 2020
Merged

Several micro optimizations #4833

merged 3 commits into from
Dec 2, 2020

Conversation

epwalsh
Copy link
Member

@epwalsh epwalsh commented Dec 2, 2020

Mainly replacing calls where a tensor is created and then sent to a device with .to(device) with calls that create the tensor directly on the device, which is over twice as fast. You can run the benchmarks yourself with:

pytest -c benchmarks/pytest.ini benchmarks/nn/util_bench.py -k 'create_tensor'

These are the results I got:

image

@@ -1548,7 +1548,6 @@ def add_sentence_boundary_token_ids(
The new mask for the tensor, taking into account the appended tokens
marking the beginning and end of the sentence.
"""
# TODO: matthewp, profile this transfer
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked this. This function is actually faster this way than with keeping sequence_lengths on GPU with:

sequence_lengths = mask.sum(dim=1).detach()

@epwalsh epwalsh requested review from AkshitaB and dirkgr December 2, 2020 19:29
Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, cool!

@dirkgr dirkgr merged commit cec9209 into master Dec 2, 2020
@dirkgr dirkgr deleted the benchmark-transfers branch December 2, 2020 22:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants