Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for derive source feature and Implementing it for DateFieldMapper #17383

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tanik98
Copy link

@tanik98 tanik98 commented Feb 18, 2025

Description

  • Adding interface for support of building source using stored/docvalues in FieldMapper

  • Added derivedSourceSupported flag to allowlist field mapper for this feature

  • Implemented these methods for DateFieldMapper

    • In DateFieldMapper, stored field are stored as Long where as docValues are stored as sorted numeric in lucene.
    • While deriving source, we are first checking for stored field and if not found we are checking docValues, only difference between these would be, in stored field order would be preserved for multi value case
  • Tested the changes by integrating the flow with "doc by id" query path, cases covered:

    1. store=false, field value is getting fetched from docValues, for multi value result is returned in sorted order
    2. store=true, field value is getting fetched from stored field
    3. date format is defined, in result date will preserve the format
    4. store=false & doc_values=false, unsupported exception will be thrown

Related Issues

Resolves #17073

Part of feature #9568

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 5aa6a8c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

* @param docId - docId for which we want to derive the source
* @throws IOException
*/
public void buildDerivedSource(XContentBuilder builder, LeafReader leafReader, int docId) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 2 separate methods i.e. buildDerivedSource and deriveSource?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the buildDerivedSource method and embeded derive source feasibility checks under single method canDeriveSource only.

boolean isStoredFieldPresent = false;
// 1. Lookup stored field, which will help in preserving order of values in case of multi value field
// 2. If field value is not found using stored field, lookup doc values
if (mappedFieldType.isStored()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since stored fields are stored in row fashion, it may turn out to be slower than doc values. Can we create an issue to benchmark what to prefer if both are available?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fetching few values, doc values would be faster for almost all cases as it can be served from OS cache(it can be very performant as multiple doc ids can be served from the same page) as well, whereas stored fields are stored in row fashion so for this serving as many doc ids as doc values might not be possible from a single memory page.

But while deriving source with multiple fields using stored field(considering for most fields, stored field is enabled) might be performant as for a single doc, all the field would be in nearby locality in memory. This we can benchmark, for now going with doc value first approach only considering wider cases where stored field would not be enabled explicitly in index field mapping and by default it's false.

* @param docId - docId for which we want to derive the source
* @throws IOException
*/
protected void deriveSource(XContentBuilder builder, LeafReader leafReader, int docId) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this support nested and/or object fields?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, will keep this feature disabled for nested object, later we will expand it for nested object fields as well.

@@ -226,6 +232,58 @@ private static DateFieldMapper toType(FieldMapper in) {
return (DateFieldMapper) in;
}

@Override
protected void deriveSource(XContentBuilder builder, LeafReader leafReader, int docId) throws IOException {
possibleToDeriveSource();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a field level parameter that enables this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, derivedSourceSupported flag is defined in MappedFieldType, which controls the feature at field level.

Signed-off-by: Tanik Pansuriya <[email protected]>
@tanik98 tanik98 force-pushed the tanik-derived-source-feature branch from 5aa6a8c to ab3aa0a Compare February 20, 2025 06:05
Copy link
Contributor

❌ Gradle check result for ab3aa0a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b0d4600: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@tanik98 tanik98 requested a review from mgodwan February 20, 2025 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Derived Source] Add support for deriving source field in FieldMapper
3 participants