Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Update installation mechanism for Transforms in Fleet according to new specifications #140046

Merged
merged 27 commits into from
Sep 20, 2022
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6a276de
Installation
qn895 Aug 30, 2022
ceca037
Installation
qn895 Sep 1, 2022
66a1886
Clean up and ensure uninstallation removes all transforms
qn895 Sep 6, 2022
679d000
Merge remote-tracking branch 'upstream/main' into ml-transform-instal…
qn895 Sep 6, 2022
79c7127
Undo extra lines
qn895 Sep 6, 2022
1f02beb
Clean up
qn895 Sep 6, 2022
5aec440
Merge remote-tracking branch 'upstream/main' into ml-transform-instal…
qn895 Sep 7, 2022
c984cf5
Fix merging
qn895 Sep 7, 2022
911f676
Merge remote-tracking branch 'upstream/main' into ml-transform-instal…
qn895 Sep 8, 2022
807db41
Rename installTransform -> installTransforms, remove job references
qn895 Sep 8, 2022
822ca03
Improve logic for merging index template and transform, add unit tests
qn895 Sep 8, 2022
fe5ab2f
Remove todo
qn895 Sep 8, 2022
04c7b51
Merge remote-tracking branch 'upstream/main' into ml-transform-instal…
qn895 Sep 9, 2022
a83c0aa
Ensure backward compatibility for legacy json assets
qn895 Sep 9, 2022
d0aac73
Remove todo
qn895 Sep 11, 2022
c5c9643
Fix missing legacy transform name for reference
qn895 Sep 12, 2022
ff35926
Merge remote-tracking branch 'upstream/main' into ml-transform-instal…
qn895 Sep 12, 2022
efe1d8c
Fix jest test
qn895 Sep 12, 2022
fa7e8e2
Refactor legacy and new installations
qn895 Sep 14, 2022
8fa543f
Separate out logic, add component template
qn895 Sep 14, 2022
ae38fc9
Remove now unused merge_index_template_mappings
qn895 Sep 14, 2022
f2d92ec
Merge remote-tracking branch 'upstream/main' into ml-transform-instal…
qn895 Sep 14, 2022
5a98445
Add retries, switch to using buildComponentTemplates
qn895 Sep 15, 2022
572e2e5
Merge remote-tracking branch 'upstream/main' into ml-transform-instal…
qn895 Sep 15, 2022
203b0ab
Merge branch 'main' into ml-transform-installation-fleet
kibanamachine Sep 19, 2022
eb060c0
Merge remote-tracking branch 'upstream/main' into ml-transform-instal…
qn895 Sep 20, 2022
09a7c08
Update template names to not have version
qn895 Sep 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,43 @@

import type { ElasticsearchClient, Logger, SavedObjectsClientContract } from '@kbn/core/server';
import { errors } from '@elastic/elasticsearch';
import { safeLoad } from 'js-yaml';
import { isPopulatedObject } from '@kbn/ml-is-populated-object';

import { installComponentAndIndexTemplateForDataStream } from '../template/install';
import { processFields } from '../../fields/field';
import { generateMappings } from '../template/template';
import { getESAssetMetadata } from '../meta';
import { updateEsAssetReferences } from '../../packages/install';
import { getPathParts } from '../../archive';
import { ElasticsearchAssetType } from '../../../../../common/types/models';
import type { EsAssetReference, InstallablePackage } from '../../../../../common/types/models';
import type {
EsAssetReference,
InstallablePackage,
ESAssetMetadata,
IndexTemplate,
} from '../../../../../common/types/models';
import { getInstallation } from '../../packages';

import { getESAssetMetadata } from '../meta';

import { retryTransientEsErrors } from '../retry';

import { mergeIndexTemplateWithMappings } from './merge_index_template_mappings';
import { deleteTransforms } from './remove';
import { getAsset } from './common';

interface TransformInstallation {
interface TransformModuleBase {
transformModuleId?: string;
}
interface DestinationIndexTemplateInstallation extends TransformModuleBase {
installationName: string;
_meta: ESAssetMetadata;
template: IndexTemplate['template'];
}
interface TransformInstallation extends TransformModuleBase {
installationName: string;
content: any;
}

export const installTransform = async (
export const installLegacyTransforms = async (
installablePackage: InstallablePackage,
paths: string[],
esClient: ElasticsearchClient,
Expand Down Expand Up @@ -65,7 +82,7 @@ export const installTransform = async (
if (transformPaths.length > 0) {
const transformRefs = transformPaths.reduce<EsAssetReference[]>((acc, path) => {
acc.push({
id: getTransformNameForInstallation(installablePackage, path, installNameSuffix),
id: getLegacyTransformNameForInstallation(installablePackage, path, installNameSuffix),
type: ElasticsearchAssetType.transform,
});

Expand All @@ -87,7 +104,7 @@ export const installTransform = async (
content._meta = getESAssetMetadata({ packageName: installablePackage.name });

return {
installationName: getTransformNameForInstallation(
installationName: getLegacyTransformNameForInstallation(
installablePackage,
path,
installNameSuffix
Expand Down Expand Up @@ -117,6 +134,218 @@ export const installTransform = async (
return { installedTransforms, esReferences };
};

export const installTransforms = async (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is quite long, are there any bits logic that could be pulled out into separate functions and unit tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here fa7e8e2

installablePackage: InstallablePackage,
paths: string[],
esClient: ElasticsearchClient,
savedObjectsClient: SavedObjectsClientContract,
logger: Logger,
esReferences?: EsAssetReference[]
) => {
const transformPaths = paths.filter((path) => isTransform(path));
// If package contains legacy transform specifications (i.e. with json instead of yml)
if (transformPaths.some((p) => p.endsWith('.json')) || transformPaths.length === 0) {
return await installLegacyTransforms(
installablePackage,
paths,
esClient,
savedObjectsClient,
logger,
esReferences
);
}

// Assuming the package will only contain .yml specifications
const installation = await getInstallation({
savedObjectsClient,
pkgName: installablePackage.name,
});

esReferences = esReferences ?? installation?.installed_es ?? [];

let previousInstalledTransformEsAssets: EsAssetReference[] = [];
if (installation) {
previousInstalledTransformEsAssets = installation.installed_es.filter(
({ type, id }) => type === ElasticsearchAssetType.transform
);
if (previousInstalledTransformEsAssets.length) {
logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is debug log level more appropriate for this message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously logger.info but I can definitely change it to debug level if it's more appropriate 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here fa7e8e2

`Found previous transform references:\n ${JSON.stringify(
previousInstalledTransformEsAssets
)}`
);
}
}

// delete all previous transform
await deleteTransforms(
esClient,
previousInstalledTransformEsAssets.map((asset) => asset.id)
);

const installNameSuffix = `${installablePackage.version}`;
let installedTransforms: EsAssetReference[] = [];
if (transformPaths.length > 0) {
const transformsSpecifications = new Map();
const destinationIndexTemplates: DestinationIndexTemplateInstallation[] = [];
const indices = [];
const transforms: TransformInstallation[] = [];

transformPaths.forEach((path: string) => {
const { transformModuleId, fileName } = getTransformFolderAndFileNames(
installablePackage,
path
);

// Since there can be multiple assets per transform definition
// We want to create a unique list of assets/specifications for each transform
if (transformsSpecifications.get(transformModuleId) === undefined) {
transformsSpecifications.set(transformModuleId, new Map());
}
const packageAssets = transformsSpecifications.get(transformModuleId);

const content = safeLoad(getAsset(path).toString('utf-8'));

if (fileName === 'fields') {
const validFields = processFields(content);
const mappings = generateMappings(validFields);
packageAssets?.set('mappings', mappings);
}

if (fileName === 'transform') {
transformsSpecifications.get(transformModuleId)?.set('destinationIndex', content.dest);
indices.push(content.dest);
transformsSpecifications.get(transformModuleId)?.set('transform', content);
content._meta = getESAssetMetadata({ packageName: installablePackage.name });
transforms.push({
transformModuleId,
installationName: getTransformNameForInstallation(
installablePackage,
transformModuleId,
installNameSuffix
),
content,
});
indices.push(content.dest);
}

if (fileName === 'manifest') {
if (isPopulatedObject(content, ['start']) && content.start === false) {
transformsSpecifications.get(transformModuleId)?.set('start', false);
}
// If manifest.yml contains destination_index_template
// Combine the mappings and other index template settings from manifest.yml into a single index template
// Create the index template and track the template in EsAssetReferences
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered using component templates here instead of manually merging?

When we create data streams in fleet we have a structure which allows users to customise the data streams if they need to, here is the de doc describing the structure:

### Component Templates (as of 8.2)

The issue with having all of the settings and mappings on the index template is that the user has to modify the index template to make changes, which will then be overwritten on package update. the @custom component template gets around this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting information, but also implies immense complexity, so I think we'll need to take a step back and set some boundaries on the scope of transforms in packages.

The complexity comes because if the package data stream, which provides the transform's source data, can be infinitely customised by the user then the transform itself may need to be customised to match.

In the case of a pivot transform maybe it's OK to say that the transform doesn't do anything with the custom fields, so the transform destination index wouldn't need to contain them.

For a latest transform maybe we can just include the same @custom component template in the destination mappings that was included in the source mappings, and make a rule that the fields used to determine "latest" have to be defined by the package and not be custom fields.

This might avoid the need to have the ability to change the behaviour of the transform itself.

It's also interesting that in the transforms that have been added to packages in the past the namespace has been hardcoded as default, so there hasn't been any attempt in the past to make a packaged transform generic enough to work with different namespaces with different mappings - see for example: https://github.com/elastic/elasticsearch/blob/498a24b61f5e37148f48e16c9d47d6c49af08aac/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/store/ReservedRolesStore.java#L808-L815

What's the package with the most complex combination of component templates we have today? I think it would help if we could look at how all the flexibility is currently being used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 Reading the original specification elastic/package-spec#307 again and I think I misunderstood the default part originally. You're right, it should be according the namespace and not be hardcoded in the name. I'll update that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the package with the most complex combination of component templates we have today? I think it would help if we could look at how all the flexibility is currently being used.

AWS ships probably the most component template of any integration since it has so many policy templates.

image

If you install the AWS integration w/ many of its inputs enabled you'll get quite a lot of component templates installed. I don't know if this is necessarily "complex" as they don't do anything too involved from what I can tell, but maybe this is helpful here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't discussed customization as a requirement for transforms in packages and IMO we should exclude that from the scope for now. I do think it's a good idea to avoid any manual merging on the Kibana side if possible.

I'm curious if we actually need to support both destination_index_template and mappings in the package-spec? Would it be possible to only allow one or the other and throw an error otherwise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion with Mark, I switched to using buildComponentTemplates helper to handle the logic for better consistency with what Fleet is doing. However, if we think it's better to throw error instead of both are defined I can definitely do that as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here 8fa543f

if (
isPopulatedObject(content, ['destination_index_template']) ||
isPopulatedObject(packageAssets.get('mappings'))
) {
const destinationIndexTemplate =
(content.destination_index_template as Record<string, unknown>) ?? {};
destinationIndexTemplates.push({
transformModuleId,
_meta: getESAssetMetadata({ packageName: installablePackage.name }),
installationName: getTransformNameForInstallation(
installablePackage,
transformModuleId,
installNameSuffix,
'template'
),
template: destinationIndexTemplate,
} as DestinationIndexTemplateInstallation);
packageAssets.set('destinationIndexTemplate', destinationIndexTemplate);
}
}
});

const indexTemplatesRefs = destinationIndexTemplates.map((template) => ({
id: template.installationName,
type: ElasticsearchAssetType.indexTemplate,
}));

const transformRefs = transforms.map((t) => ({
id: t.installationName,
type: ElasticsearchAssetType.transform,
}));

// get and save refs associated with the transforms before installing
esReferences = await updateEsAssetReferences(
savedObjectsClient,
installablePackage.name,
esReferences,
{
assetsToAdd: [...indexTemplatesRefs, ...transformRefs],
assetsToRemove: previousInstalledTransformEsAssets,
}
);

await Promise.all(
destinationIndexTemplates
.map((destinationIndexTemplate) => {
const mergedTemplate = mergeIndexTemplateWithMappings(
destinationIndexTemplate.template,
transformsSpecifications
.get(destinationIndexTemplate.transformModuleId)
?.get('mappings')
);
if (mergedTemplate !== undefined) {
return installComponentAndIndexTemplateForDataStream({
esClient,
logger,
componentTemplates: {},
indexTemplate: {
templateName: destinationIndexTemplate.installationName,
// @ts-expect-error Index template here should not contain data_stream property
// as this template is applied to only an index and not a data stream
indexTemplate: {
template: mergedTemplate,
priority: 250,
index_patterns: [
transformsSpecifications
.get(destinationIndexTemplate.transformModuleId)
?.get('destinationIndex').index,
],
_meta: destinationIndexTemplate._meta,
composed_of: [],
},
},
});
}
})
.filter((p) => p !== undefined)
);
await Promise.all(
transforms.map(async (transform) => {
const index = transform.content.dest.index;
const pipelineId = transform.content.dest.pipeline;

const indexExist = await esClient.indices.exists({
index,
});
if (indexExist !== true) {
return esClient.indices.create({
index,
...(pipelineId ? { settings: { default_pipeline: pipelineId } } : {}),
});
}
})
);

const transformsPromises = transforms.map(async (transform) => {
return handleTransformInstall({
esClient,
logger,
transform,
startTransform: transformsSpecifications.get(transform.transformModuleId)?.get('start'),
});
});

installedTransforms = await Promise.all(transformsPromises).then((results) => results.flat());
}

return { installedTransforms, esReferences };
};

export const isTransform = (path: string) => {
const pathParts = getPathParts(path);
return !path.endsWith('/') && pathParts.type === ElasticsearchAssetType.transform;
Expand All @@ -126,10 +355,12 @@ async function handleTransformInstall({
esClient,
logger,
transform,
startTransform,
}: {
esClient: ElasticsearchClient;
logger: Logger;
transform: TransformInstallation;
startTransform?: boolean;
}): Promise<EsAssetReference> {
try {
await retryTransientEsErrors(
Expand All @@ -151,15 +382,20 @@ async function handleTransformInstall({
throw err;
}
}
await esClient.transform.startTransform(
{ transform_id: transform.installationName },
{ ignore: [409] }
);

// start transform by default if not set in yml file
// else, respect the setting
if (startTransform === undefined || startTransform === true) {
await esClient.transform.startTransform(
{ transform_id: transform.installationName },
{ ignore: [409] }
);
}

return { id: transform.installationName, type: ElasticsearchAssetType.transform };
}

const getTransformNameForInstallation = (
const getLegacyTransformNameForInstallation = (
installablePackage: InstallablePackage,
path: string,
suffix: string
Expand All @@ -169,3 +405,27 @@ const getTransformNameForInstallation = (
const folderName = pathPaths?.pop();
return `${installablePackage.name}.${folderName}-${filename}-${suffix}`;
};

const getTransformNameForInstallation = (
installablePackage: InstallablePackage,
transformModuleId: string,
suffix: string,
assetType?: string
) => {
return `logs-${installablePackage.name}.${transformModuleId}-${
assetType === undefined ? 'default' : assetType
}-${suffix}`;
};

const getTransformFolderAndFileNames = (installablePackage: InstallablePackage, path: string) => {
const pathPaths = path.split('/');
const fileName = pathPaths?.pop()?.split('.')[0];
let transformModuleId = pathPaths?.pop();

// If fields.yml is located inside a directory called 'fields' (e.g. {exampleFolder}/fields/fields.yml)
// We need to go one level up to get the real folder name
if (transformModuleId === 'fields') {
transformModuleId = pathPaths?.pop();
}
return { fileName: fileName ?? '', transformModuleId: transformModuleId ?? '' };
};
Loading