[CI] ActivateWatchTests fails occasionally on master not able to delete index #66495

breskeby · 2020-12-17T10:00:00Z

Seeing this failing occasionally (4 out of ~1200 in 7 days). I wonder if this is showing some specific issue with the test or with the infrastructure as the error seems to indicate some sort of a permission issue?

Build scan:

https://gradle-enterprise.elastic.co/s/usis5fhbuqx6y/tests/:x-pack:plugin:watcher:internalClusterTest/org.elasticsearch.xpack.watcher.transport.action.activate.ActivateWatchTests/testDeactivateAndActivate#1

Repro line:

./gradlew ':x-pack:plugin:watcher:internalClusterTest' --tests "org.elasticsearch.xpack.watcher.transport.action.activate.ActivateWatchTests.testDeactivateAndActivate" -Dtests.seed=FE8DAE73AB0A616A -Dtests.security.manager=true -Dtests.locale=ru-RU -Dtests.timezone=US/Central -Druntime.java=11

Reproduces locally?:

no.

Applicable branches:

master

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?search.relativeStartTime=P7D&search.timeZoneId=Europe/Berlin&tests.container=org.elasticsearch.xpack.watcher.transport.action.activate.ActivateWatchTests&tests.sortField=FAILED&tests.test=testDeactivateAndActivate&tests.unstableOnly=true
-->
Failure excerpt:

java.lang.IllegalArgumentException: index [.ds-.watcher-history-14-2020.12.17-000001] is the write index for data stream [.watcher-history-14] and cannot be deleted

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-12-18T22:51:00Z

Pinging @elastic/es-core-features (Team:Core/Features)

martijnvg · 2021-01-06T10:57:43Z

During the teardown, the test attempts to delete write index of data stream. This is prohibited, a write index of a data stream must always exist (other backing indices can be deleted). The data stream should be removed in its entirety, which is what is supposed to happen via AbstractWatcherIntegrationTestCase#_cleanup(...), but somehow the test infra tries to just delete the write index. Not sure why this happens yet.

martijnvg · 2021-01-06T12:34:48Z

It looks like the watcher history data stream gets created after watcher has been stopped and the previous data stream has been removed:

1> [2020-12-17T03:30:00,522][INFO ][o.e.x.w.WatcherLifeCycleService] [node_s0] watcher has stopped |  
-- | --
  | 1> [2020-12-17T03:30:00,525][INFO ][o.e.x.w.t.a.a.ActivateWatchTests] [testDeactivateAndActivate] waiting to stop watcher, current states [Tuple [v1=node_s3 (0), v2=STOPPED], Tuple [v1=node_s2 (0), v2=STOPPED], Tuple [v1=node_s0 (0), v2=STOPPED], Tuple [v1=node_s1 (0), v2=STOPPED]] |  
  | 1> [2020-12-17T03:30:00,527][INFO ][o.e.x.d.a.DeleteDataStreamTransportAction] [node_s0] removing data stream [.watcher-history-14] |  
  | 1> [2020-12-17T03:30:00,527][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_s0] [.ds-.watcher-history-14-2020.12.17-000001/S9MRwpX-SmWEaSLkl4DrNw] deleting index |  
  | 1> [2020-12-17T03:30:00,608][INFO ][o.e.x.w.t.a.a.ActivateWatchTests] [testDeactivateAndActivate] [ActivateWatchTests#testDeactivateAndActivate]: cleaning up after test |  
  | 1> [2020-12-17T03:30:00,622][INFO ][o.e.c.m.MetadataCreateIndexService] [node_s0] [.ds-.watcher-history-14-2020.12.17-000001] creating index, cause [initialize_data_stream], templates [.watch-history-14], shards [1]/[0] |  
  | 1> [2020-12-17T03:30:00,624][INFO ][o.e.c.m.MetadataCreateDataStreamService] [node_s0] adding data stream [.watcher-history-14] with write index [.ds-.watcher-history-14-2020.12.17-000001] and backing indices [] |  
  | 1> [2020-12-17T03:30:00,720][INFO ][o.e.x.i.IndexLifecycleTransition] [node_s0] moving index [.ds-.watcher-history-14-2020.12.17-000001] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [watch-history-ilm-policy] |  
  | 1> [2020-12-17T03:30:00,757][INFO ][o.e.c.r.a.AllocationService] [node_s0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.ds-.watcher-history-14-2020.12.17-000001][0]]])." previous.health="YELLOW" reason="shards started [[.ds-.watcher-history-14-2020.12.17-000001][0]]" |  
  | 1> [2020-12-17T03:30:00,837][INFO ][o.e.c.m.MetadataMappingService] [node_s0] [.ds-.watcher-history-14-2020.12.17-000001/1Ysc1vvwSZ2mHgNjxAGjgg] update_mapping [_doc]

I suspect this happens because write requests for watch history are still inflight after watcher has been stopped.
The exception is most likely thrown when the test cluster is wiped (which deletes all indices), which is after when AbstractWatcherIntegrationTestCase has removed the watcher history data stream and stopped watcher.

by waiting for all tasks to complete before deleting watcher history data stream. Closes elastic#66495

by waiting for all tasks to complete before deleting watcher history data stream. Closes #66495

hendrikmuhs · 2021-02-11T15:52:06Z

I see the same stacktrace in a failure today:

https://gradle-enterprise.elastic.co/s/5hq6kmhcgi53i/console-log?anchor=7304

henningandersen · 2021-03-30T12:20:45Z

This happened in a PR build here:
https://gradle-enterprise.elastic.co/s/gwj4needgfl7s

pretty sure it is not related to my changes, since the only change I did was to merge in master.

pgomulka · 2021-06-18T09:22:34Z

another failure on 6.8
https://gradle-enterprise.elastic.co/s/3sfkpghtsfryc

./gradlew ':x-pack:plugin:watcher:unitTest' -Dtests.seed=12EADBE525B1E30A -Dtests.class=org.elasticsearch.xpack.watcher.transport.action.activate.ActivateWatchTests -Dtests.method="testDeactivateAndActivate" -Dtests.security.manager=true -Dtests.locale=nl -Dtests.timezone=Europe/Saratov -Dcompiler.java=11 -Druntime.java=8

does not reproduce locally

dimitris-athanasiou · 2021-06-28T13:18:18Z

And another on master https://gradle-enterprise.elastic.co/s/haa4hv337tiaq

Also this seems to be the same issue as #61538. I'll close that as this one has had more discussion.

ywangd · 2021-07-19T23:31:04Z

Another one on master https://gradle-enterprise.elastic.co/s/hy5u6idtonkea

jkakavas · 2021-07-20T12:38:03Z

Another one in a PR today : https://gradle-enterprise.elastic.co/s/hzc2cypydrkq4/tests/:x-pack:plugin:watcher:internalClusterTest/org.elasticsearch.xpack.watcher.transport.action.activate.ActivateWatchTests/testDeactivateAndActivate?top-execution=1

astefan · 2021-09-03T13:34:09Z

Happened today again on master: https://gradle-enterprise.elastic.co/s/t3me7fp2rks7w

REPRODUCE WITH: ./gradlew ':x-pack:plugin:watcher:internalClusterTest' --tests "org.elasticsearch.xpack.watcher.transport.action.activate.ActivateWatchTests.testDeactivateAndActivate" -Dtests.seed=C7DEBD15F3F219A9 -Dtests.locale=bg-BG -Dtests.timezone=Asia/Omsk -Druntime.java=11

org.elasticsearch.xpack.watcher.transport.action.activate.ActivateWatchTests > testLoadWatchWithoutAState FAILED
    java.lang.IllegalArgumentException: index [.ds-.watcher-history-14-2021.09.03-000001] is the write index for data stream [.watcher-history-14] and cannot be deleted
        at __randomizedtesting.SeedInfo.seed([C7DEBD15F3F219A9:300CDB785EC3DD55]:0)
        at org.elasticsearch.cluster.metadata.MetadataDeleteIndexService.deleteIndices(MetadataDeleteIndexService.java:85)
        at org.elasticsearch.cluster.metadata.MetadataDeleteIndexService$1.execute(MetadataDeleteIndexService.java:67)
        at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:48)
        at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:745)
        at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:361)
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:224)
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:161)
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:139)
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:177)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:259)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:222)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

benwtrent · 2021-11-29T13:02:33Z

Happened again off of main branch: https://gradle-enterprise.elastic.co/s/y6r4m2tn4ogo2

trace

org.elasticsearch.xpack.watcher.transport.action.activate.ActivateWatchTests > testDeactivateAndActivate FAILED |  
-- | --
  | java.lang.IllegalArgumentException: index [.ds-.watcher-history-14-2021.11.29-000001] is the write index for data stream [.watcher-history-14] and cannot be deleted |  
  | at __randomizedtesting.SeedInfo.seed([97D850CBD028BB52:378028361F716D2E]:0) |  
  | at org.elasticsearch.cluster.metadata.MetadataDeleteIndexService.deleteIndices(MetadataDeleteIndexService.java:90) |  
  | at org.elasticsearch.cluster.metadata.MetadataDeleteIndexService$1.execute(MetadataDeleteIndexService.java:68) |  
  | at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51) |  
  | at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:828) |  
  | at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:395) |  
  | at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:235) |  
  | at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:167) |  
  | at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146) |  
  | at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202) |  
  | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:712) |  
  | at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:262) |  
  | at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:225) |  
  | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) |  
  | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) |  
  | at java.base/java.lang.Thread.run(Thread.java:833)

stu-elastic · 2021-12-02T21:43:19Z

Another instance: https://gradle-enterprise.elastic.co/s/zmyadhq4xjppw, same trace.

jkakavas · 2021-12-23T18:02:43Z

happened again https://gradle-enterprise.elastic.co/s/e5dxrbejgnibk, I'll mute this on master

See #66495

This commit attempts to fix a set of Watcher tests that can fail due to unexpected execution of Watches after Watcher has been stopped. The theory here is that a Watch can be queued but not fully executed then Watcher is shutdown, the test does some clean up, then the queued Watch finishes execution and causes some some additional cleanup to fail. The change here ensures that when Watcher is stopped from AbstractWatcherIntegrationTestCase that it will also wait until there are no more current Watches executing. closes #66495

This commit attempts to fix a set of Watcher tests that can fail due to unexpected execution of Watches after Watcher has been stopped. The theory here is that a Watch can be queued but not fully executed then Watcher is shutdown, the test does some clean up, then the queued Watch finishes execution and causes some some additional cleanup to fail. The change here ensures that when Watcher is stopped from AbstractWatcherIntegrationTestCase that it will also wait until there are no more current Watches executing. closes elastic#66495

This commit attempts to fix a set of Watcher tests that can fail due to unexpected execution of Watches after Watcher has been stopped. The theory here is that a Watch can be queued but not fully executed then Watcher is shutdown, the test does some clean up, then the queued Watch finishes execution and causes some some additional cleanup to fail. The change here ensures that when Watcher is stopped from AbstractWatcherIntegrationTestCase that it will also wait until there are no more current Watches executing. closes #66495

This commit attempts to fix a set of Watcher tests that can fail due to unexpected execution of Watches after Watcher has been stopped. The theory here is that a Watch can be queued but not fully executed then Watcher is shutdown, the test does some clean up, then the queued Watch finishes execution and causes some some additional cleanup to fail. The change here ensures that when Watcher is stopped from AbstractWatcherIntegrationTestCase that it will also wait until there are no more current Watches executing. closes elastic#66495

breskeby added the >test-failure Triaged test failures from CI label Dec 17, 2020

mark-vieira added the :Data Management/Watcher label Dec 18, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label Dec 18, 2020

martijnvg self-assigned this Jan 6, 2021

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 15, 2021

Improve watcher cleanup logic

2bf0cb5

by waiting for all tasks to complete before deleting watcher history data stream. Closes elastic#66495

martijnvg mentioned this issue Jan 15, 2021

Improve watcher cleanup logic #67554

Merged

martijnvg closed this as completed in #67554 Jan 15, 2021

martijnvg added a commit that referenced this issue Jan 15, 2021

Improve watcher cleanup logic (#67554)

1c85599

by waiting for all tasks to complete before deleting watcher history data stream. Closes #66495

hendrikmuhs reopened this Feb 11, 2021

henningandersen mentioned this issue Mar 30, 2021

Total data set size in stats #70625

Merged

dimitris-athanasiou mentioned this issue Jun 28, 2021

ActivateWatchTests suite failure #61538

Closed

jkakavas mentioned this issue Jul 20, 2021

Support password_hash for Change Password API #75500

Merged

gwbrown mentioned this issue Nov 29, 2021

Strip index blocks from settings for reindex targets #80887

Merged

masseyke mentioned this issue Dec 15, 2021

[CI] ActivateWatchTests testDeactivateAndActivate failing #81653

Closed

jkakavas mentioned this issue Dec 23, 2021

Mute ActivateWatchTests #82071

Merged

jkakavas added a commit that referenced this issue Dec 23, 2021

Mute ActivateWatchTests (#82071)

e912a1d

See #66495

jakelandis mentioned this issue Dec 30, 2021

Watcher fix ActivateWatchTests #82157

Merged

jakelandis closed this as completed in #82157 Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] ActivateWatchTests fails occasionally on master not able to delete index #66495

[CI] ActivateWatchTests fails occasionally on master not able to delete index #66495

breskeby commented Dec 17, 2020

elasticmachine commented Dec 18, 2020

martijnvg commented Jan 6, 2021

martijnvg commented Jan 6, 2021

hendrikmuhs commented Feb 11, 2021

henningandersen commented Mar 30, 2021 •

edited

Loading

pgomulka commented Jun 18, 2021

dimitris-athanasiou commented Jun 28, 2021

ywangd commented Jul 19, 2021

jkakavas commented Jul 20, 2021

astefan commented Sep 3, 2021

benwtrent commented Nov 29, 2021

stu-elastic commented Dec 2, 2021 •

edited

Loading

jkakavas commented Dec 23, 2021

[CI] ActivateWatchTests fails occasionally on master not able to delete index #66495

[CI] ActivateWatchTests fails occasionally on master not able to delete index #66495

Comments

breskeby commented Dec 17, 2020

elasticmachine commented Dec 18, 2020

martijnvg commented Jan 6, 2021

martijnvg commented Jan 6, 2021

hendrikmuhs commented Feb 11, 2021

henningandersen commented Mar 30, 2021 • edited Loading

pgomulka commented Jun 18, 2021

dimitris-athanasiou commented Jun 28, 2021

ywangd commented Jul 19, 2021

jkakavas commented Jul 20, 2021

astefan commented Sep 3, 2021

benwtrent commented Nov 29, 2021

stu-elastic commented Dec 2, 2021 • edited Loading

jkakavas commented Dec 23, 2021

henningandersen commented Mar 30, 2021 •

edited

Loading

stu-elastic commented Dec 2, 2021 •

edited

Loading