Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.test.telemetry.tracing.validators.AllSpansHaveUniqueId fails with java.lang.OutOfMemoryError sporadically #12615

Closed
reta opened this issue Mar 12, 2024 · 6 comments · Fixed by #13054
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Other v2.14.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@reta
Copy link
Collaborator

reta commented Mar 12, 2024

Describe the bug

The org.opensearch.test.telemetry.tracing.validators.AllSpansHaveUniqueId fails with java.lang.OutOfMemoryError sporadically:

java.lang.OutOfMemoryError: Java heap space
	at __randomizedtesting.SeedInfo.seed([11E79186241F3EB2:AD561042C55D1621]:0)
	at java.base/java.util.HashMap.newNode(HashMap.java:1909)
	at java.base/java.util.HashMap.putVal(HashMap.java:637)
	at java.base/java.util.HashMap.put(HashMap.java:618)
	at java.base/java.util.HashSet.add(HashSet.java:229)
	at org.opensearch.test.telemetry.tracing.validators.AllSpansHaveUniqueId.validate(AllSpansHaveUniqueId.java:42)
	at org.opensearch.test.telemetry.tracing.TelemetryValidators.validate(TelemetryValidators.java:37)
	at org.opensearch.test.telemetry.tracing.StrictCheckSpanProcessor.validateTracingStateOnShutdown(StrictCheckSpanProcessor.java:79)
	at org.opensearch.test.OpenSearchTestClusterRule.afterClass(OpenSearchTestClusterRule.java:99)
	at org.opensearch.test.OpenSearchTestClusterRule.initializeSuiteScope(OpenSearchTestClusterRule.java:399)
	at org.opensearch.test.OpenSearchTestClusterRule.before(OpenSearchTestClusterRule.java:168)
	at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:365)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)

Related component

Other

To Reproduce

Not easy to reproduce by see please:

Expected behavior

No java.lang.OutOfMemoryError should be raised

Additional Details

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • CI

Additional context

@reta
Copy link
Collaborator Author

reta commented Mar 12, 2024

@Gaganjuneja please take a look at your convenience, thank you

@Gaganjuneja
Copy link
Contributor

@Gaganjuneja please take a look at your convenience, thank you

Sure, thanks!

@mch2
Copy link
Member

mch2 commented Mar 22, 2024

I think this is causing flakiness across the board ex - #12801 and other symptoms slow network thread... in tests.

In the linked test i could see up to 100k spans passed to this validate method.

@mch2
Copy link
Member

mch2 commented Mar 22, 2024

Have raised #12877 to disable this plugin until this is resolved.

Looks like these spans are added for at least every write/replication request and not cleared until after the test executes.

@mch2
Copy link
Member

mch2 commented Mar 25, 2024

Related - #12877 (comment)
@Gaganjuneja, Do you know why we don't see any telemetry test failures with this disabled?

@Gaganjuneja
Copy link
Contributor

This is to verify if all the spans are closed. Telemetry tests are not dependent on this feature.

@reta reta added v3.0.0 Issues and PRs related to version 3.0.0 v2.14.0 labels Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Other v2.14.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants