Skip to content

Commit

Permalink
[SPARK-33928][SPARK-23365][TEST][CORE] Fix flaky o.a.s.ExecutorAlloca…
Browse files Browse the repository at this point in the history
…tionManagerSuite - " Don't update target num executors when killing idle executors"

### What changes were proposed in this pull request?

Use the testing mode for the test to fix the flaky.

### Why are the changes needed?

The test is flaky:

```scala
[info] - SPARK-23365 Don't update target num executors when killing idle executors *** FAILED *** (126 milliseconds)
[info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1615)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
[info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
[info] at org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
...
```
The root cause should be the same as apache#29773 since the test run under non-testing mode.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually checked. Flaky is gone by running the test hundreds of times after this fix.

Closes apache#30956 from Ngone51/fix-flaky-SPARK-23365.

Authored-by: yi.wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
Ngone51 authored and cloud-fan committed Dec 29, 2020
1 parent f7bdea3 commit 1ef7ddd
Showing 1 changed file with 3 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1588,7 +1588,7 @@ class ExecutorAllocationManagerSuite extends SparkFunSuite {
test("SPARK-23365 Don't update target num executors when killing idle executors") {
val clock = new ManualClock()
val manager = createManager(
createConf(1, 2, 1).set(config.DYN_ALLOCATION_TESTING, false),
createConf(1, 2, 1),
clock = clock)

when(client.requestTotalExecutors(any(), any(), any())).thenReturn(true)
Expand Down Expand Up @@ -1616,19 +1616,17 @@ class ExecutorAllocationManagerSuite extends SparkFunSuite {
clock.advance(1000)
manager invokePrivate _updateAndSyncNumExecutorsTarget(clock.nanoTime())
assert(numExecutorsTargetForDefaultProfileId(manager) === 1)
verify(client, never).killExecutors(any(), any(), any(), any())
assert(manager.executorMonitor.executorsPendingToRemove().isEmpty)

// now we cross the idle timeout for executor-1, so we kill it. the really important
// thing here is that we do *not* ask the executor allocation client to adjust the target
// number of executors down
when(client.killExecutors(Seq("executor-1"), false, false, false))
.thenReturn(Seq("executor-1"))
clock.advance(3000)
schedule(manager)
assert(maxNumExecutorsNeededPerResourceProfile(manager, defaultProfile) === 1)
assert(numExecutorsTargetForDefaultProfileId(manager) === 1)
// here's the important verify -- we did kill the executors, but did not adjust the target count
verify(client).killExecutors(Seq("executor-1"), false, false, false)
assert(manager.executorMonitor.executorsPendingToRemove() === Set("executor-1"))
}

test("SPARK-26758 check executor target number after idle time out ") {
Expand Down

0 comments on commit 1ef7ddd

Please sign in to comment.