Revise timing metrics collection to avoid unaccounted-for time #254

robertknight · 2024-06-29T06:34:43Z

In the GPT-2 medium model, about 6% of the runtime for each step was reported as "[Other]", which is the difference between the total time for the run and the sum of individual operator timings. Upon analysis, this difference turned out to be a combination of:

The overhead of getting the time with Instant::now
Rounding and accumulation errors due to storing timings in various resolutions and using single-precision float operations to sum timings
Setup time spent in Graph::run_plan before the start of the main loop

The Instant::now overhead on macOS (Intel) is about 30% rdtsc + fences and 70% call overhead according to samply.

This commit addresses these as follows:

The number of Instant::now calls has been reduced from 2 to 1 per step, with each call capturing the total time for the whole step, now just the call to Operator::run. This means that output processing for a step is now accounted for in the time for that step.
Operator timings are now represented, and summed as, Duration values. These are integer counts of nanoseconds, avoiding issues with rounding and float accumulation.
The run timing is now reported as the sum of operator timings, excluding setup before the main loop.

The Timer utility type was removed as it is no longer used following these.

In the GPT-2 medium model, about 6% of the runtime for each step was reported as "[Other]", which is the difference between the total time for the run and the sum of individual operator timings. Upon analysis, this difference turned out to be a combination of: 1. The overhead of getting the time with `Instant::now` 2. Rounding and accumulation errors due to storing timings in various resolutions and using single-precision float operations to sum timings 3. Setup time spent in `Graph::run_plan` before the start of the main loop The `Instant::now` overhead on macOS (Intel) is about 30% rdtsc + fences and 70% call overhead according to samply. This commit addresses these as follows: - The number of `Instant::now` calls has been reduced from 2 to 1 per step, with each call capturing the total time for the whole step, now just the call to `Operator::run`. This means that output processing for a step is now accounted for in the time for that step. - Operator timings are now represented, and summed as, `Duration` values. These are integer counts of nanoseconds, avoiding issues with rounding and float accumulation. - The run timing is now reported as the sum of operator timings, excluding setup before the main loop. The `Timer` utility type was removed as it is no longer used following these.

robertknight force-pushed the revise-timing-metrics branch from 4cb16a1 to c3cd3d8 Compare June 29, 2024 06:38

robertknight merged commit d7a79f6 into main Jun 29, 2024
2 checks passed

robertknight deleted the revise-timing-metrics branch June 29, 2024 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise timing metrics collection to avoid unaccounted-for time #254

Revise timing metrics collection to avoid unaccounted-for time #254

robertknight commented Jun 29, 2024 •

edited

Loading

Revise timing metrics collection to avoid unaccounted-for time #254

Revise timing metrics collection to avoid unaccounted-for time #254

Conversation

robertknight commented Jun 29, 2024 • edited Loading

robertknight commented Jun 29, 2024 •

edited

Loading