Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up InternalEngine#resolveDocVersion(...) method #122374

Merged

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Feb 12, 2025

Use reference manager to get index reader instead of acquire a searcher. The latter involves creating an index searcher, which is not used and expensive as part of the resolveDocVersion(...) method. This method is invoked for each document that gets indexed.

image

Use reference manager to get index reader instead of acquire a searcher. The latter involved creating an index searcher, which is not used and expensive as part og resolveDocVersion() method.
@martijnvg martijnvg added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. :StorageEngine/TSDB You know, for Metrics labels Feb 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@martijnvg martijnvg marked this pull request as ready for review February 12, 2025 15:52
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine Team:Distributed Indexing Meta label for Distributed Indexing team labels Feb 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

final VersionsAndSeqNoResolver.DocIdAndVersion docIdAndVersion;
try (Searcher searcher = acquireSearcher("load_version", SearcherScope.INTERNAL)) {
var indexReader = referenceManager.acquire();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to hold a reference to the store before acquiring the referenceManager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange that no test then failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed: 8072907

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's very unlikely and might even be impossible to run into a closed store here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, resolveDocVersion is executed while holding a ref on the engine (preventing the engine to be closed and in turn the store to be closed too).

@martijnvg martijnvg force-pushed the InternalEngine_resolveDocVersion_method branch from c0a0199 to 8072907 Compare February 13, 2025 13:24
@martijnvg
Copy link
Member Author

I swapped the enhancement label with non-issue label. After further analysis the reason why constructing an IndexSearcher instance is so expensive is that from Lucene 10.1.0 (apache/lucene#13893), the leaf slices are computed eagerly whereas before leaf slices were computed lazily and in the context of resolveDocVersion(...) method leaf slices were never computed.

This is still the right change, the resolveDocVersion(...) method doesn't need an IndexSearches instances and just needs a DirectoryReader instance and so creating it is useless. Especially since the resolveDocVersion () method is invoked for each document with a non auto-generated id that gets indexed.

/* Acquire order here is store -> manager since we need
* to make sure that the store is not closed before
* the searcher is acquired. */
if (store.tryIncRef() == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah see @tlrx, we don't need to ever acquire a store because we are guaranteed to have an engine reference here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so we can just change that into an assertion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so yes, assertion should be enough :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed: b81fa73

}
}

abstract static class DirectoryReaderSupplier implements Releasable {
Copy link
Member

@original-brownbear original-brownbear Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only extended once, we can make this one class for the reader supplier can't we? Or better yet, maybe just inline the logic to save some cycles? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed: 1ee2699

@martijnvg martijnvg added v9.0.1 auto-backport Automatically create backport pull requests when merged labels Feb 14, 2025
Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) thanks for the iterations Martijn!

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Martijn!

@martijnvg martijnvg merged commit 018b3ce into elastic:main Feb 14, 2025
17 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 14, 2025
Use reference manager to get index reader instead of acquiring a searcher. The latter involves creating an index searcher, which is not used and expensive as part of the `resolveDocVersion(...)` method, because this method is invoked for each document that gets indexed.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x
9.0

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 14, 2025
Use reference manager to get index reader instead of acquiring a searcher. The latter involves creating an index searcher, which is not used and expensive as part of the `resolveDocVersion(...)` method, because this method is invoked for each document that gets indexed.
elasticsearchmachine pushed a commit that referenced this pull request Feb 14, 2025
)

Use reference manager to get index reader instead of acquiring a searcher. The latter involves creating an index searcher, which is not used and expensive as part of the `resolveDocVersion(...)` method, because this method is invoked for each document that gets indexed.
elasticsearchmachine pushed a commit that referenced this pull request Feb 14, 2025
)

Use reference manager to get index reader instead of acquiring a searcher. The latter involves creating an index searcher, which is not used and expensive as part of the `resolveDocVersion(...)` method, because this method is invoked for each document that gets indexed.
original-brownbear added a commit that referenced this pull request Feb 24, 2025
Checking whether we need to refresh does not require a searcher
so we can simplify this to just work based on the reader
and avoid lots of contention etc. for setting up the searcher.

relates #122374
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Feb 24, 2025
Checking whether we need to refresh does not require a searcher
so we can simplify this to just work based on the reader
and avoid lots of contention etc. for setting up the searcher.

relates elastic#122374
elasticsearchmachine pushed a commit that referenced this pull request Feb 24, 2025
Checking whether we need to refresh does not require a searcher
so we can simplify this to just work based on the reader
and avoid lots of contention etc. for setting up the searcher.

relates #122374
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >non-issue :StorageEngine/TSDB You know, for Metrics Team:Distributed Indexing Meta label for Distributed Indexing team Team:StorageEngine v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants