feat: add flag enforce_max_duration #798

anhappdev · 2023-10-06T03:44:19Z

The logic is now:

if (result_min_duration_met && result_min_queries_met && early_stopping_met) => blue text
else if (result_min_duration_met && early_stopping_met) => purple text
else red text

The result screen will look like this:

github-actions · 2023-10-06T03:44:33Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

sonarqubecloud · 2023-10-06T04:19:25Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

0.0% Coverage
0.0% Duplication

freedomtan · 2023-10-31T05:49:07Z

Let's test it: @freedomtan @AhmedTElthakeb @mohitmundhragithub

freedomtan · 2023-11-16T02:57:11Z

I still saw invalid color for > 600 seconds. Is this expected?

anhappdev · 2023-11-17T00:54:43Z

I still saw invalid color for > 600 seconds. Is this expected?

No, as per @pgmpablo157321

When you set enforce_max_duration = False, it won't fail when the maximum duration is reached.

anhappdev · 2023-11-17T01:06:14Z

@pgmpablo157321 Can you check if the result is expected:
It looks like the flag enforce_max_duration has no effect.

cd "/Users/anh/dev/mlcommons/mobile_app_open" && \
	bazel-bin/flutter/cpp/binary/main EXTERNAL super_resolution \
		--mode=PerformanceOnly \
		--output_dir=""/Users/anh/dev/mlcommons/mobile_app_open"/output" \
		--model_file=""/Users/anh/dev/mlcommons/mobile_app_open"/mobile_back_apple/dev-resources/edsr_final/converted/edsr_f32b5_fp32.tflite" \
		--lib_path="bazel-bin/mobile_back_tflite/cpp/backend_tflite/libtflitebackend.so" \
		--images_directory=""/Users/anh/dev/mlcommons/mobile_app_open"/mobile_back_apple/dev-resources/psnr/LR" \
		--ground_truth_directory=""/Users/anh/dev/mlcommons/mobile_app_open"/mobile_back_apple/dev-resources/psnr/HR" \
		--max_duration_ms=10000 \
		--min_duration_ms=100
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 694327333
Result is : INVALID
  Min duration satisfied : Yes
  Min queries satisfied : Skipped
  Early stopping satisfied: NO
Recommendations:
 * The test exited early, before enough queries were issued.
   See the detailed log for why this may have occurred.
Early Stopping Result:
 * Only processed 17 queries.
 * Need to process at least 64 queries for early stopping.

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 1.52
QPS w/o loadgen overhead        : 1.53

Min latency (ns)                : 584926042
Max latency (ns)                : 699675166
Mean latency (ns)               : 653122000
50.00 percentile latency (ns)   : 679391667
90.00 percentile latency (ns)   : 694327333
95.00 percentile latency (ns)   : 699675166
97.00 percentile latency (ns)   : 699675166
99.00 percentile latency (ns)   : 699675166
99.90 percentile latency (ns)   : 699675166

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 100
max_duration (ms): 10000
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 148687905518835231
sample_index_rng_seed : 520418551913322573
schedule_rng_seed : 811580660758947900
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 80

No warnings encountered during test.

No errors encountered during test.

sonarqubecloud · 2023-11-17T01:39:27Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

0.0% Coverage
0.0% Duplication

freedomtan · 2023-11-21T06:24:04Z

@pgmpablo157321 is it possible to add something like total running time/duration to the summary generated by the loadgen (so that we can check it's valid or not in the summary page)?

AhmedTElthakeb · 2023-11-28T05:24:14Z

Tested on unsupported device, run ends at 10mins but result still rendered as INVALID

freedomtan · 2023-11-28T06:07:49Z

@freedomtan and @anhappdev to contact @pgmpablo157321 by e-mail

pgmpablo157321 · 2023-11-28T16:42:14Z

@anhappdev Sorry for the late reply, I don't usually check the issues in this repo

@pgmpablo157321 Can you check if the result is expected:
It looks like the flag enforce_max_duration has no effect.

I don't think that result is invalid because of the max duration. It seems it failed because of the early stopping requirements, it may sound related but I don't think they are. Early stopping is a feature that was introduced a while ago, but basically it is a method to check that target_latency was reached, and looking at the log, it seems it was not. Could this be the problem? Are you able to test with a larger target_latency?

@freedomtan

@pgmpablo157321 is it possible to add something like total running time/duration to the summary generated by the loadgen (so that we can check it's valid or not in the summary page)?

Yes, we can report this value in the summary as well.

freedomtan · 2023-12-05T06:16:55Z

looks like not the target_latency
@freedomtan to compile debug version -g and use remote gdb to check the real reason of why we got "invalid"

freedomtan · 2023-12-06T03:09:41Z

@pgmpablo157321 Can you check if the result is expected: It looks like the flag enforce_max_duration has no effect.

cd "/Users/anh/dev/mlcommons/mobile_app_open" && \
	bazel-bin/flutter/cpp/binary/main EXTERNAL super_resolution \
		--mode=PerformanceOnly \
		--output_dir=""/Users/anh/dev/mlcommons/mobile_app_open"/output" \
		--model_file=""/Users/anh/dev/mlcommons/mobile_app_open"/mobile_back_apple/dev-resources/edsr_final/converted/edsr_f32b5_fp32.tflite" \
		--lib_path="bazel-bin/mobile_back_tflite/cpp/backend_tflite/libtflitebackend.so" \
		--images_directory=""/Users/anh/dev/mlcommons/mobile_app_open"/mobile_back_apple/dev-resources/psnr/LR" \
		--ground_truth_directory=""/Users/anh/dev/mlcommons/mobile_app_open"/mobile_back_apple/dev-resources/psnr/HR" \
		--max_duration_ms=10000 \
		--min_duration_ms=100
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 694327333
Result is : INVALID
  Min duration satisfied : Yes
  Min queries satisfied : Skipped
  Early stopping satisfied: NO
Recommendations:
 * The test exited early, before enough queries were issued.
   See the detailed log for why this may have occurred.
Early Stopping Result:
 * Only processed 17 queries.
 * Need to process at least 64 queries for early stopping.

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 1.52
QPS w/o loadgen overhead        : 1.53

Min latency (ns)                : 584926042
Max latency (ns)                : 699675166
Mean latency (ns)               : 653122000
50.00 percentile latency (ns)   : 679391667
90.00 percentile latency (ns)   : 694327333
95.00 percentile latency (ns)   : 699675166
97.00 percentile latency (ns)   : 699675166
99.00 percentile latency (ns)   : 699675166
99.90 percentile latency (ns)   : 699675166

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 100
max_duration (ms): 10000
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 148687905518835231
sample_index_rng_seed : 520418551913322573
schedule_rng_seed : 811580660758947900
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 80

No warnings encountered during test.

No errors encountered during test.

I read the log carefully and did some tests. It turns out the enforce_max_duration works just not what we expected.
To get VALID result, before the enforce_max_duration flag. Three conditions (min duration, min queries, and early stopping) need to be satified. With enforce_max_duration, the Min queries satisfied is skipped, but min duration and early stopping still need to be satisfied. Here, as shown in the log, 64 queries or more are needed for early stopping.

https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#appendix-early_stopping

Result is : INVALID
   Min duration satisfied : Yes
   Min queries satisfied : Skipped
   Early stopping satisfied: NO
Recommendations:
  * The test exited early, before enough queries were issued.
    See the detailed log for why this may have occurred.
 Early Stopping Result:
  * Only processed 17 queries.
  * Need to process at least 64 queries for early stopping.

pgmpablo157321 · 2023-12-06T22:06:33Z

@freedomtan So is this behaviour correct? or what conditions do you expect to pass to have a VALID result?

freedomtan · 2023-12-07T03:36:09Z

@freedomtan So is this behaviour correct? or what conditions do you expect to pass to have a VALID result?

@pgmpablo157321 Personally, I think yes. We'll discuss it to see if the early stopping requirement is what we want.

@Mostelk and @mohitmundhragithub: what's your opinion?

Mostelk · 2023-12-07T10:15:20Z

@freedomtan So is this behaviour correct? or what conditions do you expect to pass to have a VALID result?

@pgmpablo157321 Personally, I think yes. We'll discuss it to see if the early stopping requirement is what we want.

@Mostelk and @mohitmundhragithub: what's your opinion?

The goal is to test functionality of enforce_max_duration , then let us increase the duration to 45 minutes to satisfy the 64 min queries for early stopping, and see if we get valid result.

We can also discuss what is reasonable min queries for early stopping in policy meeting, rather than removing this condition.

freedomtan · 2023-12-12T06:18:02Z

@freedomtan So is this behaviour correct? or what conditions do you expect to pass to have a VALID result?

@pgmpablo157321 Personally, I think yes. We'll discuss it to see if the early stopping requirement is what we want.
@Mostelk and @mohitmundhragithub: what's your opinion?

The goal is to test functionality of enforce_max_duration , then let us increase the duration to 45 minutes to satisfy the 64 min queries for early stopping, and see if we get valid result.

We can also discuss what is reasonable min queries for early stopping in policy meeting, rather than removing this condition.

Yes, I tested it before. If 64 queries are allowed by having large max duration, then we'll get VALID result.
Let's try to make all the benchmark items to run more than 64 queries at least.

@pgmpablo157321: could you please merge the branch into the inference repo's main branch

@anhappdev please rebase after that.

freedomtan · 2023-12-19T06:03:13Z

@pgmpablo157321: ping

freedomtan · 2024-01-02T06:10:13Z

@freedomtan to send email to check with @pgmpablo157321

freedomtan · 2024-01-23T06:10:00Z

For @pgmpablo157321's comment to have a mobile specific branch for loadgen:

maybe we don't such requirement(s) in near future,
we can maintain some local patches, as we do for TensorFlow, and upstream them later.

freedomtan · 2024-02-14T01:48:47Z

summary("  Min queries satisfied : ", min_queries_met ? "Yes" : settings.enforce_max_duration? "NO" : "Skipped");

This one is problematic too. It says:

if (min_queries_met) {
   return "Yes";
} else {
   if (settings.enforce_max_duration) {
     return "NO";
   } else {
     return "SKIPPED";
   }
}

which means, settings.enforce_max_duration should be false if we want to skip the min_queries_met

So an easy fix is to rename enforce_max_duration to dont_skip_min_queries_if_max_duration_met, but that's a bit confusing :-)

freedomtan · 2024-02-14T02:49:27Z

@anhappdev I updated enforce_max_duration logic in another branch 7009512. The main fix is in inference's mobile_update branch, mlcommons/inference@ab284da

With that, we can have what @Mostelk proposed.

Mostelk · 2024-02-14T07:41:15Z

@anhappdev I updated enforce_max_duration logic in another branch 7009512. The main fix is in inference's mobile_update branch, mlcommons/inference@ab284da

With that, we can have what @Mostelk proposed.

Based on this fix,, we should make enforce_max_duration 'true' by default and not-configurable

anhappdev · 2024-02-14T09:03:41Z

Based on this fix,, we should make enforce_max_duration 'true' by default and not-configurable

I assume this logic will not go into the main inference branch and we need to have a separate branch or a patch for this change.
If we hard-coded the enforce_max_duration flag and don't want to change it. Maybe it's better to remove that flag and update the code to always skip the min_queries_met condition (with a comment to explain why).

freedomtan · 2024-02-15T13:08:21Z

Based on this fix,, we should make enforce_max_duration 'true' by default and not-configurable

I assume this logic will not go into the main inference branch and we need to have a separate branch or a patch for this change. If we hard-coded the enforce_max_duration flag and don't want to change it. Maybe it's better to remove that flag and update the code to always skip the min_queries_met condition (with a comment to explain why).

How about changing it back to false in the inference repo and set it to be true in our cpp code (and don't make it configurable). Will this increase the chance to merge back master branch of the inference repo?

anhappdev · 2024-02-17T02:37:36Z

How about changing it back to false in the inference repo and set it to be true in our cpp code (and don't make it configurable). Will this increase the chance to merge back master branch of the inference repo?

I don't know. I think it's more a policy decision than a technical issue.

freedomtan · 2024-02-20T06:23:36Z

max_duration: max_duration should allow the task to have 64 queries to get VALID results.

as commented in #701, let's see if we can get VALID for all the tasks on some lower-tier devices.

freedomtan · 2024-02-21T04:22:12Z

I tested tflite backend on couple devices. As we discussed in #701 before, 10 mins should be fine.
from running tflite backend on Samsung Galaxy S22+ (Exynos 22).

MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 867038750
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Skipped
  Early stopping satisfied: Yes
Early Stopping Result:
 * Processed at least 64 queries (745).
 * Would discard 54 highest latency queries.
 * Early stopping 90th percentile estimate: 867902383
 * Early stopping 99th percentile estimate: 894015430

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 1.24
QPS w/o loadgen overhead        : 1.24

Min latency (ns)                : 533973125
Max latency (ns)                : 894015430
Mean latency (ns)               : 807125504
50.00 percentile latency (ns)   : 860323867
90.00 percentile latency (ns)   : 867038750
95.00 percentile latency (ns)   : 869068672
97.00 percentile latency (ns)   : 870320625
99.00 percentile latency (ns)   : 874908985
99.90 percentile latency (ns)   : 894015430

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 600000
min_query_count : 1024
max_query_count : 0
qsl_rng_seed : 148687905518835231
sample_index_rng_seed : 520418551913322573
schedule_rng_seed : 811580660758947900
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 321

No warnings encountered during test.

No errors encountered during test.

freedomtan · 2024-02-27T06:09:59Z

@anhappdev please help to make the color of VALID + early stopped results to be purple as in

Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Skipped
  Early stopping satisfied: Yes

anhappdev · 2024-02-28T06:42:54Z

@anhappdev I updated enforce_max_duration logic in another branch 7009512. The main fix is in inference's mobile_update branch, mlcommons/inference@ab284da

The logic is updated for mlperf_log_summary.txt but not for mlperf_log_detail.txt, where we parse the result.
@freedomtan Would you fix this? Or should I do it?

…ions.

anhappdev · 2024-02-28T07:33:12Z

@anhappdev please help to make the color of VALID + early stopped results to be purple as in

The result screen will look like this:

freedomtan · 2024-02-29T02:36:30Z

@anhappdev I updated enforce_max_duration logic in another branch 7009512. The main fix is in inference's mobile_update branch, mlcommons/inference@ab284da

The logic is updated for mlperf_log_summary.txt but not for mlperf_log_detail.txt, where we parse the result. @freedomtan Would you fix this? Or should I do it?

yes, please help fix it. I thought changing the log to warning (instead of error) is enough :-)

anhappdev · 2024-02-29T04:46:44Z

@freedomtan I don't have write access on the inference repo. Please merge this PR
mlcommons/inference#1654

sonarqubecloud · 2024-03-01T08:48:26Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

anhappdev added 2 commits October 6, 2023 10:36

Use dev version of loadgen

9f38ae6

Set enforce_max_duration to false

46ea6ab

anhappdev added 2 commits November 8, 2023 20:08

Merge remote-tracking branch 'origin/master' into anh/max-duration

51a3562

Merge remote-tracking branch 'origin/master' into anh/max-duration

4fd451c

anhappdev marked this pull request as ready for review November 11, 2023 13:19

anhappdev requested a review from a team as a code owner November 11, 2023 13:19

anhappdev added 2 commits November 17, 2023 08:05

Add min_duration_ms and max_duration_ms to dev Makefile

d036305

Merge remote-tracking branch 'origin/master' into anh/max-duration

a67b628

freedomtan mentioned this pull request Jan 23, 2024

For V4.0 #836

Closed

5 tasks

Merge remote-tracking branch 'origin/master' into anh/max-duration

f377069

update enforce_max_duration logic

7009512

use updated loadgen

b783dd9

Merge remote-tracking branch 'origin/master' into anh/max-duration

7c1ee2e

anhappdev added 2 commits February 28, 2024 13:44

Show different colors for performance result based on multiple condit…

d66e238

…ions.

Fix Dart linter issue

760a499

Use GitHub vars for --num-flaky-test-attempts flag

b7a7a8b

anhappdev added 4 commits February 29, 2024 12:18

Update expected_throughput for _kS22Ultra

a7ddf4b

Merge remote-tracking branch 'origin/master' into anh/max-duration

912da58

Use latest commit from inference/mobile_update branch

f679aa2

Merge remote-tracking branch 'origin/master' into anh/max-duration

d5ab584

anhappdev marked this pull request as draft March 1, 2024 08:53

anhappdev marked this pull request as ready for review March 1, 2024 09:17

freedomtan approved these changes Mar 3, 2024

View reviewed changes

anhappdev merged commit 5ef666d into master Mar 4, 2024
21 checks passed

anhappdev deleted the anh/max-duration branch March 4, 2024 07:49

github-actions bot locked and limited conversation to collaborators Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add flag enforce_max_duration #798

feat: add flag enforce_max_duration #798

anhappdev commented Oct 6, 2023 •

edited

Loading

github-actions bot commented Oct 6, 2023 •

edited

Loading

sonarqubecloud bot commented Oct 6, 2023

freedomtan commented Oct 31, 2023 •

edited

Loading

freedomtan commented Nov 16, 2023

anhappdev commented Nov 17, 2023

anhappdev commented Nov 17, 2023

sonarqubecloud bot commented Nov 17, 2023

freedomtan commented Nov 21, 2023

AhmedTElthakeb commented Nov 28, 2023

freedomtan commented Nov 28, 2023

pgmpablo157321 commented Nov 28, 2023

freedomtan commented Dec 5, 2023

freedomtan commented Dec 6, 2023

pgmpablo157321 commented Dec 6, 2023

freedomtan commented Dec 7, 2023

Mostelk commented Dec 7, 2023

freedomtan commented Dec 12, 2023

freedomtan commented Dec 19, 2023

freedomtan commented Jan 2, 2024

freedomtan commented Jan 23, 2024

freedomtan commented Feb 14, 2024 •

edited

Loading

freedomtan commented Feb 14, 2024

Mostelk commented Feb 14, 2024

anhappdev commented Feb 14, 2024

freedomtan commented Feb 15, 2024

anhappdev commented Feb 17, 2024

freedomtan commented Feb 20, 2024

freedomtan commented Feb 21, 2024

freedomtan commented Feb 27, 2024

anhappdev commented Feb 28, 2024

anhappdev commented Feb 28, 2024

freedomtan commented Feb 29, 2024

anhappdev commented Feb 29, 2024

sonarqubecloud bot commented Mar 1, 2024

feat: add flag enforce_max_duration #798

feat: add flag enforce_max_duration #798

Conversation

anhappdev commented Oct 6, 2023 • edited Loading

github-actions bot commented Oct 6, 2023 • edited Loading

sonarqubecloud bot commented Oct 6, 2023

freedomtan commented Oct 31, 2023 • edited Loading

freedomtan commented Nov 16, 2023

anhappdev commented Nov 17, 2023

anhappdev commented Nov 17, 2023

sonarqubecloud bot commented Nov 17, 2023

freedomtan commented Nov 21, 2023

AhmedTElthakeb commented Nov 28, 2023

freedomtan commented Nov 28, 2023

pgmpablo157321 commented Nov 28, 2023

freedomtan commented Dec 5, 2023

freedomtan commented Dec 6, 2023

pgmpablo157321 commented Dec 6, 2023

freedomtan commented Dec 7, 2023

Mostelk commented Dec 7, 2023

freedomtan commented Dec 12, 2023

freedomtan commented Dec 19, 2023

freedomtan commented Jan 2, 2024

freedomtan commented Jan 23, 2024

freedomtan commented Feb 14, 2024 • edited Loading

freedomtan commented Feb 14, 2024

Mostelk commented Feb 14, 2024

anhappdev commented Feb 14, 2024

freedomtan commented Feb 15, 2024

anhappdev commented Feb 17, 2024

freedomtan commented Feb 20, 2024

freedomtan commented Feb 21, 2024

freedomtan commented Feb 27, 2024

anhappdev commented Feb 28, 2024

anhappdev commented Feb 28, 2024

freedomtan commented Feb 29, 2024

anhappdev commented Feb 29, 2024

sonarqubecloud bot commented Mar 1, 2024

Quality Gate passed

anhappdev commented Oct 6, 2023 •

edited

Loading

github-actions bot commented Oct 6, 2023 •

edited

Loading

freedomtan commented Oct 31, 2023 •

edited

Loading

freedomtan commented Feb 14, 2024 •

edited

Loading