-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[system tests] Validate transforms based on the mappings #2347
base: main
Are you sure you want to change the base?
Conversation
multi_fields are now compared directly with dynamic templates or ECS fields. It was required to know whether or not that field was a multi_field or not to apply some specific validations.
…ing for other dynamic templates if parameters do not match
test integrations |
Created or updated PR in integrations repository to test this version. Check elastic/integrations#12387 |
Do not validate transform fields based on mappings for those packages where the kibana version is lower than 8.14.0. There is an known issue related to how mappings are generated. Related issue: elastic/kibana#175331
// Before stack version 8.14.0, there are some issues generating the corresponding | ||
// mappings for the fields defined in the transforms | ||
// Forced to not use mappings to validate transforms before 8.14.0 | ||
// Related issue: https://github.com/elastic/kibana/issues/175331 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some issues with the generated mappings for the transforms defined in some packages. If two group fields were defined (with different fields under it), they were not merged. This happened for instance in github
package.
It looks like it was solved here elastic/kibana#177608 and it is available/solved starting in stack version 8.14.0
Added for now a way to not run validated based on mappings (just in transforms).
test integrations |
Created or updated PR in integrations repository to test this version. Check elastic/integrations#12387 |
/test |
@@ -750,7 +752,7 @@ func (r *tester) getDocs(ctx context.Context, dataStream string) (*hits, error) | |||
return &hits{}, nil | |||
} | |||
if resp.IsError() { | |||
return nil, fmt.Errorf("failed to search docs for data stream %s: %s", dataStream, resp.String()) | |||
return nil, fmt.Errorf("failed to search docs for index or data stream %s: %s", dataStream, resp.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep just data stream to avoid being too much verbose here?
index
refers when the process waits for the docs to appear in transforms.
/test |
test integrations |
Created or updated PR in integrations repository to test this version. Check elastic/integrations#12387 |
@jsoriano after some changes, this PR is ready for review again, thanks in advance! |
// Related issue: https://github.com/elastic/kibana/issues/175331 | ||
validationMethod := r.fieldValidationMethod | ||
if stackVersion.LessThan(semver_8_14_0) { | ||
logger.Debugf("Forced to validate transforms based on fields, not available for stack versions < 8.14.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
} | ||
return processed >= len(transformDocs), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are comparing different things here:
len(transformDocs)
is the number of documents resulting from the preview.processed
is the number of processed documents from the source data stream/s.
So, if I understand correctly, processed
can be bigger than len(transformDocs)
before all the documents in the data stream are processed.
I would expect this to stop when processed >= number of docs in the source index at some point to be decided
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, IIUC you mean to compare processed
with the number of documents found in the source data stream just before validating the transform, is that right? @jsoriano
This number of documents in the data stream could be added as first step into validateTransformsWithMappings
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean to compare with the number of documents in the source data stream. What I am not so sure is when to check for this number.
This number of documents in the data stream could be added as first step into validateTransformsWithMappings.
If we are sure that we have enough documents at this point it would be perfect, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number of documents in the data stream could be added as first step into validateTransformsWithMappings.
If we are sure that we have enough documents at this point it would be perfect, yes.
Not totally 100% sure, but currently I cannot think of any other better location. This could vary for each package, since there could be packages that are ingesting documents continuously or they do not have ingested all the required documents in the data stream validation (related to #2378).
test integrations |
func (r *tester) validateTransformsWithMappings(ctx context.Context, transformId, transformName, destIndexTransform, indexTemplateTransform string, transformDocs []common.MapStr, fieldsValidator *fields.Validator) error { | ||
func (r *tester) validateTransformsWithMappings(ctx context.Context, sourceDataStream, transformId, transformName, destIndexTransform, indexTemplateTransform string, transformDocs []common.MapStr, fieldsValidator *fields.Validator) error { | ||
logger.Debugf("Searching the number of documents found in source data stream %q before validating transform %q", sourceDataStream, transformId) | ||
sourceDataStreamHits, err := r.getDocs(ctx, sourceDataStream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, thinking about #2341 (comment), we should probably use the source index here, because in cases where there are differences we would be also comparing different things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the source index in this context (where the transform is reading the documents to process) is the source data stream of the running test. If so, this would mean the data stream that the scenario.dataStream
variable refers to.
Is that what you meant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourceDataStream
is the scenario.dataStream
, right?
The source index in a transform is a pattern, that can match multiple indexes or data streams. As mentioned in #2341 (comment) there can be several data streams matching the same transform source index. In the case of system tests, I guess this means that the source index can match other data streams apart of sourceDataStream
/scenario.dataStream
.
So here we are getting the docs written by the test into its data stream, and using its count to decide when to stop the test. We are comparing the number of docs in the scenario data stream with the documents of all the matching data streams.
In the case where there are more than one data streams matching, we might be continuing even if none of the expected documents has been processed.
I think we should use here the transform source index to be sure that we check the transform after all the documents in all datastreams (including the scenario one) are processed. But maybe this is quite a corner case, not sure about the cases we have.
💚 Build Succeeded
History
cc @mrodm |
Created or updated PR in integrations repository to test this version. Check elastic/integrations#12387 |
Relates #2341
Add support to validate transforms created in packages (if any) based on their mappings.
As there are some packages that once the documents are processed are also deleted, it's required to be able to check whether or not its destination index has deleted docs.
Concerns:
Author's Checklist