Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[system tests] Validate transforms based on the mappings #2347

Open
wants to merge 56 commits into
base: main
Choose a base branch
from

Conversation

mrodm
Copy link
Contributor

@mrodm mrodm commented Jan 16, 2025

Relates #2341

Add support to validate transforms created in packages (if any) based on their mappings.

As there are some packages that once the documents are processed are also deleted, it's required to be able to check whether or not its destination index has deleted docs.

Concerns:

  • Testing packages defining transforms are going to be slower, since it is needed to wait for docs (or deleted docs).
    • This is required in order to have the destination index the actual mappings after being processed the docs.

Author's Checklist

mrodm added 30 commits December 16, 2024 19:15
multi_fields are now compared directly with dynamic templates or ECS
fields. It was required to know whether or not that field was a
multi_field or not to apply some specific validations.
…ing for other dynamic templates if parameters do not match
@mrodm mrodm requested a review from jsoriano January 23, 2025 14:54
@mrodm
Copy link
Contributor Author

mrodm commented Jan 23, 2025

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12387

mrodm added 2 commits January 24, 2025 15:37
Do not validate transform fields based on mappings for those packages
where the kibana version is lower than 8.14.0. There is an known issue
related to how mappings are generated.
Related issue: elastic/kibana#175331
Comment on lines +2045 to +2048
// Before stack version 8.14.0, there are some issues generating the corresponding
// mappings for the fields defined in the transforms
// Forced to not use mappings to validate transforms before 8.14.0
// Related issue: https://github.com/elastic/kibana/issues/175331
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some issues with the generated mappings for the transforms defined in some packages. If two group fields were defined (with different fields under it), they were not merged. This happened for instance in github package.

It looks like it was solved here elastic/kibana#177608 and it is available/solved starting in stack version 8.14.0

Added for now a way to not run validated based on mappings (just in transforms).

@mrodm
Copy link
Contributor Author

mrodm commented Jan 27, 2025

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12387

@mrodm
Copy link
Contributor Author

mrodm commented Jan 27, 2025

/test

@@ -750,7 +752,7 @@ func (r *tester) getDocs(ctx context.Context, dataStream string) (*hits, error)
return &hits{}, nil
}
if resp.IsError() {
return nil, fmt.Errorf("failed to search docs for data stream %s: %s", dataStream, resp.String())
return nil, fmt.Errorf("failed to search docs for index or data stream %s: %s", dataStream, resp.String())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep just data stream to avoid being too much verbose here?
index refers when the process waits for the docs to appear in transforms.

@mrodm
Copy link
Contributor Author

mrodm commented Jan 27, 2025

/test

@mrodm
Copy link
Contributor Author

mrodm commented Jan 27, 2025

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12387

@mrodm
Copy link
Contributor Author

mrodm commented Jan 27, 2025

@jsoriano after some changes, this PR is ready for review again, thanks in advance!

// Related issue: https://github.com/elastic/kibana/issues/175331
validationMethod := r.fieldValidationMethod
if stackVersion.LessThan(semver_8_14_0) {
logger.Debugf("Forced to validate transforms based on fields, not available for stack versions < 8.14.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
}
return processed >= len(transformDocs), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are comparing different things here:

  • len(transformDocs) is the number of documents resulting from the preview.
  • processed is the number of processed documents from the source data stream/s.

So, if I understand correctly, processed can be bigger than len(transformDocs) before all the documents in the data stream are processed.

I would expect this to stop when processed >= number of docs in the source index at some point to be decided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, IIUC you mean to compare processed with the number of documents found in the source data stream just before validating the transform, is that right? @jsoriano

This number of documents in the data stream could be added as first step into validateTransformsWithMappings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean to compare with the number of documents in the source data stream. What I am not so sure is when to check for this number.

This number of documents in the data stream could be added as first step into validateTransformsWithMappings.

If we are sure that we have enough documents at this point it would be perfect, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number of documents in the data stream could be added as first step into validateTransformsWithMappings.

If we are sure that we have enough documents at this point it would be perfect, yes.

Not totally 100% sure, but currently I cannot think of any other better location. This could vary for each package, since there could be packages that are ingesting documents continuously or they do not have ingested all the required documents in the data stream validation (related to #2378).

@mrodm
Copy link
Contributor Author

mrodm commented Jan 29, 2025

test integrations

func (r *tester) validateTransformsWithMappings(ctx context.Context, transformId, transformName, destIndexTransform, indexTemplateTransform string, transformDocs []common.MapStr, fieldsValidator *fields.Validator) error {
func (r *tester) validateTransformsWithMappings(ctx context.Context, sourceDataStream, transformId, transformName, destIndexTransform, indexTemplateTransform string, transformDocs []common.MapStr, fieldsValidator *fields.Validator) error {
logger.Debugf("Searching the number of documents found in source data stream %q before validating transform %q", sourceDataStream, transformId)
sourceDataStreamHits, err := r.getDocs(ctx, sourceDataStream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, thinking about #2341 (comment), we should probably use the source index here, because in cases where there are differences we would be also comparing different things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the source index in this context (where the transform is reading the documents to process) is the source data stream of the running test. If so, this would mean the data stream that the scenario.dataStream variable refers to.

Is that what you meant here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sourceDataStream is the scenario.dataStream, right?

The source index in a transform is a pattern, that can match multiple indexes or data streams. As mentioned in #2341 (comment) there can be several data streams matching the same transform source index. In the case of system tests, I guess this means that the source index can match other data streams apart of sourceDataStream/scenario.dataStream.

So here we are getting the docs written by the test into its data stream, and using its count to decide when to stop the test. We are comparing the number of docs in the scenario data stream with the documents of all the matching data streams.

In the case where there are more than one data streams matching, we might be continuing even if none of the expected documents has been processed.

I think we should use here the transform source index to be sure that we check the transform after all the documents in all datastreams (including the scenario one) are processed. But maybe this is quite a corner case, not sure about the cases we have.

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @mrodm

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12387

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants