Revise timing metrics collection to avoid unaccounted-for time #254
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the GPT-2 medium model, about 6% of the runtime for each step was reported as "[Other]", which is the difference between the total time for the run and the sum of individual operator timings. Upon analysis, this difference turned out to be a combination of:
Instant::now
Graph::run_plan
before the start of the main loopThe
Instant::now
overhead on macOS (Intel) is about 30% rdtsc + fences and 70% call overhead according tosamply
.This commit addresses these as follows:
Instant::now
calls has been reduced from 2 to 1 per step, with each call capturing the total time for the whole step, now just the call toOperator::run
. This means that output processing for a step is now accounted for in the time for that step.Duration
values. These are integer counts of nanoseconds, avoiding issues with rounding and float accumulation.The
Timer
utility type was removed as it is no longer used following these.