Use a lower concurrency with more repetition for L0_memory_growth #7127

krishung5 · 2024-04-17T20:43:33Z

The test is failing due to onnxruntime throwing CUDA OOM error. Decrease the concurrency and increase the repetition to make sure that CUDA memory isn't exhausted, and we still have enough amount of requests sending from PA to observe if the memory grows or not.

krishung5 · 2024-04-23T18:31:18Z

Use a larger window to avoid intermittent PA unstable issue.

krishung5 requested a review from Tabrizian April 17, 2024 20:43

Tabrizian previously approved these changes Apr 17, 2024

View reviewed changes

krishung5 added 2 commits April 22, 2024 18:06

Lower concurrency with more repetition

821ed3e

Use larger window

57286a0

krishung5 dismissed Tabrizian’s stale review via 57286a0 April 23, 2024 01:06

krishung5 force-pushed the krish-fix-l0-mem-growth branch from 27d88b6 to 57286a0 Compare April 23, 2024 01:06

krishung5 requested a review from Tabrizian April 23, 2024 18:30

Tabrizian approved these changes Apr 23, 2024

View reviewed changes

krishung5 merged commit 7d1b015 into main Apr 23, 2024
3 checks passed

krishung5 deleted the krish-fix-l0-mem-growth branch April 23, 2024 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a lower concurrency with more repetition for L0_memory_growth #7127

Use a lower concurrency with more repetition for L0_memory_growth #7127

krishung5 commented Apr 17, 2024

krishung5 commented Apr 23, 2024

Use a lower concurrency with more repetition for L0_memory_growth #7127

Use a lower concurrency with more repetition for L0_memory_growth #7127

Conversation

krishung5 commented Apr 17, 2024

krishung5 commented Apr 23, 2024