Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs test cluster going out of memory on reference/mapping/types/geo-shape docs #30811

Closed
colings86 opened this issue May 23, 2018 · 7 comments
Closed
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes :Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI

Comments

@colings86
Copy link
Contributor

Build URLs:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/6353/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+release-tests/768/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/1953/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=opensuse/2453/console

Test Class: org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT
The list of test methods that fail is always different but it seems that the follow geo_shape tests are always in the list:

  • "test {yaml=reference/mapping/types/geo-shape/line_325}"
  • "test {yaml=reference/mapping/types/geo-shape/line_325}"

The Out of memory error seems to be consistent across the builds:

22:54:32 java.lang.OutOfMemoryError: Java heap space
22:54:32 	at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:225) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:245) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:46) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:252) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:271) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:787) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:427) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:391) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1584) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1203) ~[lucene-core-7.4.0-snapshot-cc2ee23050.jar:7.4.0-snapshot-cc2ee23050 cc2ee2305001a49536886653d2133ee1a3b51b82 - nhat - 2018-05-21 16:56:19]
22:54:32 	at org.elasticsearch.index.engine.InternalEngine.addDocs(InternalEngine.java:970) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:916) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:770) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:699) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:675) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:640) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$executeIndexRequestOnPrimary$2(TransportShardBulkAction.java:568) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$1948/1498048471.get(Unknown Source) ~[?:?]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:587) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:566) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequest(TransportShardBulkAction.java:142) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:248) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:125) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:112) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:74) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1018) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:996) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:103) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
22:54:32 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:357) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
@colings86 colings86 added :Delivery/Build Build or test infrastructure :Analytics/Geo Indexing, search aggregations of geo points and shapes >test-failure Triaged test failures from CI labels May 23, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@colings86
Copy link
Contributor Author

It looks like contrary to the above stack trace this actually isn't to do with Lucene. The memory is beeing used by netty according to the heap dump obtained from the latest CI job:
screen shot 2018-05-23 at 14 21 06

We suspect the problem might be due to 31251c9 (PR #30695) which would explain why we only see the OOMEs on master.

I have raised #30813 which will revert the commit we believe is causing these OOMEs.

@danielmitterdorfer
Copy link
Member

fyi @colings86 I raised #30801 this morning.

@colings86
Copy link
Contributor Author

@danielmitterdorfer ah ok, maybe we should close this in favour of #30801 then, wdyt?

@danielmitterdorfer
Copy link
Member

Yeah, makes sense to me @colings86.

@colings86
Copy link
Contributor Author

Closing in favour of #30801

Tim-Brooks pushed a commit that referenced this issue May 23, 2018
This reverts commit 31251c9 introduced in #30695.

We suspect this commit is causing the OOME's reported in #30811 and we will use this PR to test this assertion.
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes :Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants