Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement extra resources for test actions #13996

Closed
wants to merge 1 commit into from

Conversation

scele
Copy link
Contributor

@scele scele commented Sep 15, 2021

Add support for user-specified resource types in the resource manager.
This generalizes the CPU, RAM and "test count" resource support for
other resource types such as the number of GPUs, available GPU memory,
the number of embedded devices connected to the host, etc.

The available amount of extra resources can be specified using the new
--local_extra_resources=<resourcename>=<amount> command line flag, which
is analoguous to the existing --local_cpu_resources and
--local_memory_resources flags.

Tests can then declare the amount of extra resources they need by using
a resources:<resourcename>:<amount> tag.

@google-cla google-cla bot added the cla: yes label Sep 15, 2021
@scele
Copy link
Contributor Author

scele commented Sep 15, 2021

Picking up #11963.

@meisterT meisterT requested a review from wilwell September 15, 2021 13:46
Copy link

@wilwell wilwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally code looks good!
But I mentioned some changes about constructors and namings.

/** How many extra resources an action requires for execution. */
public static final ParseableRequirement RESOURCES =
ParseableRequirement.create(
"resources:<str>:<int>",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are using int instead of float? On ResourceSet we are using ImmutableMap<String, Float> extraResourceUsage for this purposes. Also resources could be float, e.g. cpu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to match the cpu:<int> code above. Looks like there is some inconsistency around what can be a float and what can be an int:

  • Execution requirements: src/main/java/com/google/devtools/build/lib/actions/ExecutionRequirements.java parses int.
  • Test tags: src/main/java/com/google/devtools/build/lib/analysis/test/TestTargetProperties.java parses Float.
  • Command line flags: src/main/java/com/google/devtools/build/lib/util/ResourceConverter.java parses int or a Float in the expression multiplier:
  /** Description of the accepted inputs to the converter. */
  public static final String FLAG_SYNTAX =
      "an integer, or a keyword (\"auto\", \"HOST_CPUS\", \"HOST_RAM\"), optionally followed by "
          + "an operation ([-|*]<float>) eg. \"auto\", \"HOST_CPUS*.5\"";

In its current form, this change is matching the existing code: integers in execution requirements, floats in test tags, integer on the command line, and floats in the resource manager code. But I'm open to suggestions if we want to use slightly different conventions than what is used for CPU resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

floats feel like they'd be more flexible, I can make the change here in #16785.

}

if (value < 1) {
return "can't be zero or negative";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why some of values could not be zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also just to match the cpu:<int> parsing logic above.

Copy link
Contributor

@drewmacrae drewmacrae Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike with CPUs (that are required to orchestrate tests) it's likely GPU or embedded target based tests will be in test suites with unit tests that don't require newly defined resources. This should be changed to accept values of 0.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that we could accept zero values

@@ -43,12 +45,24 @@
/** The number of CPUs, or fractions thereof. */
private final double cpuUsage;

/** Map of extra resources mapping name of the resource to a value. */
private final ImmutableMap<String, Float> extraResourceUsage;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please mention what type of resources it could be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private static final ResourceSet MEDIUM_RESOURCES = ResourceSet.create(100, 1, 1);
private static final ResourceSet LARGE_RESOURCES = ResourceSet.create(300, 1, 1);
private static final ResourceSet ENORMOUS_RESOURCES = ResourceSet.create(800, 1, 1);
private static final ResourceSet SMALL_RESOURCES = ResourceSet.create(20, 1, null, 1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could ignore null field here. We have the constructor with only 3 fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -422,6 +477,6 @@ synchronized int getWaitCount() {

@VisibleForTesting
synchronized boolean isAvailable(double ram, double cpu, int localTestCount) {
return areResourcesAvailable(ResourceSet.create(ram, cpu, localTestCount));
return areResourcesAvailable(ResourceSet.create(ram, cpu, null, localTestCount));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could ignore null here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 55 to 65
this(memoryMb, cpuUsage, null, localTestCount);
}

private ResourceSet(double memoryMb, double cpuUsage, ImmutableMap<String, Float> extraResourceUsage, int localTestCount) {
this.memoryMb = memoryMb;
this.cpuUsage = cpuUsage;
if (extraResourceUsage == null) {
this.extraResourceUsage = ImmutableMap.of();
} else {
this.extraResourceUsage = extraResourceUsage;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this checks on null looks overcomplicated and it's better to avoid nulls in code. So I suggest mark extraResourceUsage in second constructor as @NonNull and use ImmutableMap.of() in the first constructor.
Also it's better to use 3 param constructor on other code where it's possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to work with a couple of caveats:

  • I had to add new dependencies in
  • Using @NonNull decorator in ResourceSet.create resulted in //src/test/shell/bazel:bazel_bootstrap_distfile_tar_test test failures with error: No generator for: (@io.reactivex.rxjava3.annotations.NonNull :: com.google.common.collect.ImmutableMap<java.lang.String,java.lang.Float>) (see CI job here). I removed @NonNull from that function and only kept in the constructor - do you think that's OK?

double memoryMb, double cpuUsage, int localTestCount) {
if (memoryMb == 0 && cpuUsage == 0 && localTestCount == 0) {
double memoryMb, double cpuUsage, ImmutableMap<String, Float> extraResourceUsage, int localTestCount) {
if (memoryMb == 0 && cpuUsage == 0 && (extraResourceUsage == null || extraResourceUsage.size() == 0) && localTestCount == 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove check on null.

help =
"Set the number of extra resources available to Bazel. "
+ "Takes in a string-float pair. Can be used multiple times to specify multiple "
+ "types of extra resources. Bazel will limit concurrently running test actions "
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned test actions here. If this param only for tests, then it's better to add test in its name. E.g. local_test_extra_resources. This is also applies to name of this field in classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess the implementation should not be restricted to tests, although I have not verified that it works for non-test actions as well. Let me investigate this a bit more. Would be nice to match the behavior and naming of --local_cpu_resources and --local_ram_resources which also apply to both test and build actions?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this resources are implied not only for test actions. Could you please then remove test from description or use text about tests as an example.

Comment on lines 457 to 467
for (Map.Entry<String, Float> resource : resources.getExtraResourceUsage().entrySet()) {
String key = (String)resource.getKey();
float used = (float)usedExtraResources.getOrDefault(key, 0f);
float requested = resource.getValue();
float available = (float)availableResources.getExtraResourceUsage().getOrDefault(key, 0f);
float epsilon = 0.0001f; // Account for possible rounding errors.
if (requested != 0.0 && used != 0.0 && requested + used > available + epsilon) {
extraResourcesIsAvailable = false;
break;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to move this code in function to make code more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@scele scele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @wilwell! Replied to some of the comments, will address the rest of them in the next patch I push.

/** How many extra resources an action requires for execution. */
public static final ParseableRequirement RESOURCES =
ParseableRequirement.create(
"resources:<str>:<int>",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to match the cpu:<int> code above. Looks like there is some inconsistency around what can be a float and what can be an int:

  • Execution requirements: src/main/java/com/google/devtools/build/lib/actions/ExecutionRequirements.java parses int.
  • Test tags: src/main/java/com/google/devtools/build/lib/analysis/test/TestTargetProperties.java parses Float.
  • Command line flags: src/main/java/com/google/devtools/build/lib/util/ResourceConverter.java parses int or a Float in the expression multiplier:
  /** Description of the accepted inputs to the converter. */
  public static final String FLAG_SYNTAX =
      "an integer, or a keyword (\"auto\", \"HOST_CPUS\", \"HOST_RAM\"), optionally followed by "
          + "an operation ([-|*]<float>) eg. \"auto\", \"HOST_CPUS*.5\"";

In its current form, this change is matching the existing code: integers in execution requirements, floats in test tags, integer on the command line, and floats in the resource manager code. But I'm open to suggestions if we want to use slightly different conventions than what is used for CPU resources?

}

if (value < 1) {
return "can't be zero or negative";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also just to match the cpu:<int> parsing logic above.

help =
"Set the number of extra resources available to Bazel. "
+ "Takes in a string-float pair. Can be used multiple times to specify multiple "
+ "types of extra resources. Bazel will limit concurrently running test actions "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess the implementation should not be restricted to tests, although I have not verified that it works for non-test actions as well. Let me investigate this a bit more. Would be nice to match the behavior and naming of --local_cpu_resources and --local_ram_resources which also apply to both test and build actions?

@scele scele force-pushed the extra_resources branch 2 times, most recently from 736c7a1 to 4478fca Compare September 22, 2021 13:41
Add support for user-specified resource types in the resource manager.
This generalizes the CPU, RAM and "test count" resource support for
other resource types such as the number of GPUs, available GPU memory,
the number of embedded devices connected to the host, etc.

The available amount of extra resources can be specified using the new
--local_extra_resources=<resourcename>=<amount> command line flag, which
is analoguous to the existing --local_cpu_resources and
--local_memory_resources flags.

Tests can then declare the amount of extra resources they need by using
a "resources:<resourcename>:<amount>" tag.
@joeljeske
Copy link
Contributor

@scele thanks for your effort in getting this ready! I would be very excited to see this land; I see a lot of broad value in having full support for declaring resources needed for tests.

Do you plan to pick this back up and greening it up?

Copy link

@wilwell wilwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sync the changes from the repo. These files have a lot of changes during last month.

@@ -137,6 +141,10 @@ public static ResourceManager instance() {
// definition in the ResourceSet class.
private double usedRam;

// Used amount of extra resources. Corresponds to the extra resource
// definition in the ResourceSet class.
private Map<String, Float> usedExtraResources;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please sync the changes from the repository. This file had significant changes during last month.

help =
"Set the number of extra resources available to Bazel. "
+ "Takes in a string-float pair. Can be used multiple times to specify multiple "
+ "types of extra resources. Bazel will limit concurrently running test actions "
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this resources are implied not only for test actions. Could you please then remove test from description or use text about tests as an example.

@@ -124,6 +124,7 @@ java_library(
"//third_party:flogger",
"//third_party:guava",
"//third_party:jsr305",
"//third_party:rxjava3",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this library?

}

if (value < 1) {
return "can't be zero or negative";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that we could accept zero values

return ImmutableMap.copyOf(extraResources);
}

public ResourceSet getLocalResourceUsage(Label label, boolean usingLocalTestJobs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please reaarange the methods to make the diff between two version more clear?

@sgowroji sgowroji added team-Local-Exec Issues and PRs for the Execution (Local) team awaiting-review PR is awaiting review from an assigned reviewer awaiting-user-response Awaiting a response from the author and removed awaiting-review PR is awaiting review from an assigned reviewer labels Jun 28, 2022
@sgowroji
Copy link
Member

sgowroji commented Jul 4, 2022

Hello @scele, Can you please resolve conflicts and above comments for the requested changes. Thank you!

@sgowroji
Copy link
Member

We are marking the above PR as stale because it has not had any recent activity from many days. It will be closed in 7 days if there is no further activity occurs. Thank you.

@scele
Copy link
Contributor Author

scele commented Jul 15, 2022

Yeah, sorry, I'm on vacation now and can't really commit to any finite date to push this to the finish line. So closing this PR for now.

@scele scele closed this Jul 15, 2022
@drewmacrae drewmacrae mentioned this pull request Nov 17, 2022
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Nov 17, 2022
Addressed comments on bazelbuild#13996

Signed-off-by: Drew Macrae <[email protected]>
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Nov 22, 2022
Addressed comments on bazelbuild#13996
Fixed issues in tests and built and tested with
lowRISC/opentitan#16436

Signed-off-by: Drew Macrae <[email protected]>
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Nov 30, 2022
Addressed comments on bazelbuild#13996
Fixed issues in tests and built and tested with
lowRISC/opentitan#16436

Signed-off-by: Drew Macrae <[email protected]>
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Dec 10, 2022
Addressed comments on bazelbuild#13996
Fixed issues in tests and built and tested with
lowRISC/opentitan#16436

Signed-off-by: Drew Macrae <[email protected]>
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Dec 10, 2022
Addressed comments on bazelbuild#13996
Fixed issues in tests and built and tested with
lowRISC/opentitan#16436

Signed-off-by: Drew Macrae <[email protected]>
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Dec 10, 2022
Addressed comments on bazelbuild#13996
Fixed issues in tests and built and tested with
lowRISC/opentitan#16436

Signed-off-by: Drew Macrae <[email protected]>
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Dec 10, 2022
Addressed comments on bazelbuild#13996
Fixed issues in tests and built and tested with
lowRISC/opentitan#16436

Signed-off-by: Drew Macrae <[email protected]>
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Dec 10, 2022
Addressed comments on bazelbuild#13996
Fixed issues in tests and built and tested with
lowRISC/opentitan#16436

Signed-off-by: Drew Macrae <[email protected]>
copybara-service bot pushed a commit that referenced this pull request Dec 30, 2022
This recreates a [closed PR](#13996) to implement extra resources which we're hoping to use in lowRISC/opentitan#16436

Fixes:#16817

Closes #16785.

PiperOrigin-RevId: 498557024
Change-Id: I60d8f8f4a4a02748147cabb4cd60a2a9b95a2c68
drewmacrae pushed a commit to drewmacrae/bazel that referenced this pull request Jan 17, 2023
This recreates a [closed PR](bazelbuild#13996) to implement extra resources which we're hoping to use in lowRISC/opentitan#16436

Fixes:bazelbuild#16817

Closes bazelbuild#16785.

PiperOrigin-RevId: 498557024
Change-Id: I60d8f8f4a4a02748147cabb4cd60a2a9b95a2c68
@drewmacrae drewmacrae mentioned this pull request Jan 17, 2023
ShreeM01 added a commit that referenced this pull request Jan 19, 2023
This recreates a [closed PR](#13996) to implement extra resources which we're hoping to use in lowRISC/opentitan#16436

Fixes:#16817

Closes #16785.

PiperOrigin-RevId: 498557024
Change-Id: I60d8f8f4a4a02748147cabb4cd60a2a9b95a2c68

Co-authored-by: kshyanashree <[email protected]>
hvadehra pushed a commit that referenced this pull request Feb 14, 2023
This recreates a [closed PR](#13996) to implement extra resources which we're hoping to use in lowRISC/opentitan#16436

Fixes:#16817

Closes #16785.

PiperOrigin-RevId: 498557024
Change-Id: I60d8f8f4a4a02748147cabb4cd60a2a9b95a2c68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-response Awaiting a response from the author cla: yes team-Local-Exec Issues and PRs for the Execution (Local) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants