Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sample "input" packages #325

Merged
merged 9 commits into from
May 4, 2022
Merged

Conversation

mtojek
Copy link
Contributor

@mtojek mtojek commented Apr 27, 2022

This PR is a follow-up on the design document.

I will start with the sample package and once we agree on properties, I will write down the spec.

@mtojek mtojek requested review from ruflin and joshdover April 27, 2022 15:07
@mtojek mtojek self-assigned this Apr 27, 2022
categories:
- custom
policy_templates:
- name: first_policy_template
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't put the elasticsearch block here. Should I add few/all options from here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wouldn't see a reason not to support the same options. Unless there is something we want to deprecate from the data stream manifests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we have:

elasticsearch.index_template.settings
elasticsearch.index_template.mappings
elasticsearch.index_template.ingest_pipeline.name
elasticsearch.privileges.indices

As we don't know in advance what would be the target index/ingest pipeline, we can't set these properties:
elasticsearch.index_template.ingest_pipeline.name
elasticsearch.privileges.indices

Am I right?

Copy link
Member

@jsoriano jsoriano Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we don't know in advance what would be the target index/ingest pipeline, we can't set these properties:
elasticsearch.index_template.ingest_pipeline.name
elasticsearch.privileges.indices

Umm, I don't see these are used in current integrations repo. What are use cases for these options in integration packages? Could they apply in input packages?

In any case, maybe you are right with not adding the elasticsearch block by now, so we start small and we see later possible use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, I don't see these are used in current integrations repo. What are use cases for these options in integration packages? Could they apply in input packages?

If you can find any package here, then most likely it's APM or Endpoint.

@elasticmachine
Copy link

elasticmachine commented Apr 27, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-04-29T13:23:28.425+0000

  • Duration: 3 min 33 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Copy link
Collaborator

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on having 2-3 sample packages to derive the spec.

multi: true
required: true
show_user: true
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a sample package but I wonder if an input package should even have defaults here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that "defaults" can limit the amount of work on the initial configuration. There might be cases where you don't know what to do and would like to kick off the ingesting pipeline with default settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep support for default, but I wouldn't put any default paths in the package for custom logs. I think that we can expect that a user knows at least what logs they want to collect :)

An example of a default in a log input package could be to have a variable to support ignore_older, that would default to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not real Custom logs package. I rather wanted to present all options there, which seem to be valid in terms of spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can have paths as example without defaults, and ignore_older as example with defaults 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

license: basic
categories:
- custom
policy_templates:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do policy templates make sense for input packages? It is really convenient that we can reuse the same logic as we already have so I'm not necessarily asking to change anything here but have a discussion.

  • Can an input package have more then one policy_templates?
  • Does an input package need all features of policy_templates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does an input package need all features of policy_templates?

It looks like we can improve on this. These are the properties available in integration's policy_template:

  • name
  • title
  • categories
  • description
  • data_streams
  • inputs
  • multiple (?)
  • icons
  • screenshot

Clearly, we don't need all of them, so I assumed that this policy_template (a different one in terms of spec) will contain a subset (or a different set) of properties. If you consider a different name for policy_template, do you have any suggestions? maybe an input_template?

Can an input package have more then one policy_templates?

Good question, but I believe that the answer might be the same as for policy_templates. In most cases, we will need one. What should we do in the future if we scope this for one only :)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the spec allow to use the same name but for example if it is defined as input package, to not allow certain params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be totally new spec files here, so yes. We may consider extracting some common parts or referring to the integration type temporarily (for example, to fields properties).

Once we agree on the look, I will write down the spec files. I don't want to iterate on both at the same time as it usually means more work.

@mtojek mtojek requested a review from jsoriano April 28, 2022 09:51
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

categories:
- custom
policy_templates:
- name: first_policy_template
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wouldn't see a reason not to support the same options. Unless there is something we want to deprecate from the data stream manifests.

multi: true
required: true
show_user: true
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep support for default, but I wouldn't put any default paths in the package for custom logs. I think that we can expect that a user knows at least what logs they want to collect :)

An example of a default in a log input package could be to have a variable to support ignore_older, that would default to 0.

multi: false
required: true
show_user: false
default: "variables"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For such cases, it would be great to have a pattern or allowed values. cc @joshdover

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played with patterns (as regexp) here #245. I would prefer a list of allowed values :)

@mtojek
Copy link
Contributor Author

mtojek commented Apr 29, 2022

Do you think that we need more sample packages or is it enough to start drafting spec files?

{{#each tags}}
- {{this}}
{{/each}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a few minimum additions to this, that needs to be in every input package, and many of them are in other integrations as well:

Pipelines:

{{#if pipeline}}
pipeline: {{pipeline}}
{{/if}}

Tags are slightly wrong, they should be like this:

{{#if tags}}
tags:
{{#each tags as |tag i|}}
  - {{tag}}
{{/each}}
{{/if}}

Since an input can ingest data both from a local host and external host, we need to add support for removing the host fields:

{{#contains "forwarded" tags}}
publisher_pipeline.disable_host: true
{{/contains}}

And custom processors:

{{#if processors}}
processors:
{{processors}}
{{/if}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom processing has been intentionally left out of the MVP, it would also require custom mappings.

Also, depending on how this is implemented, this wouldn't require anything on the spec, the pipelines and the mappings could be configured in Fleet and installed by it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few of your ideas to the sample package.

description: Collect your custom log files.
input: logfile
template_path: input.yml.hbs
vars:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have all the settings here, does that mean we won't have any datastream at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any data stream directories, but with Kibana UI you will be able to select/create the target data stream.

@ruflin
Copy link
Collaborator

ruflin commented Apr 29, 2022

Should we add one input package that has an open endpoint like http as an additional example? https://github.com/elastic/package-storage/tree/production/packages/http_endpoint/1.0.1 @andrewkroh Would be good to get your eyes also on this.

@mtojek
Copy link
Contributor Author

mtojek commented Apr 29, 2022

Should we add one input package that has an open endpoint like http as an additional example? https://github.com/elastic/package-storage/tree/production/packages/http_endpoint/1.0.1

I think the more useful would be httpjson and I'm working on that one.

@ruflin
Copy link
Collaborator

ruflin commented Apr 29, 2022

httpjson is useful too. The reason I mentioned the http_endpoint is that it is different from the others as it is a push instead of pull endpoint.

@mtojek
Copy link
Contributor Author

mtojek commented Apr 29, 2022

Ok, I think that 3 packages are sufficient to show our intention with input packages. From a developer perspective, I would say that the biggest difference is in immersing the data stream manifest in the package manifest. Apart from that, this is a relatively similar type to the integration.

I will keep the PR hanging until next week to collect more comments, then will focus on spec files.

@mtojek mtojek changed the title Describe new package type: input Add sample "input" packages Apr 29, 2022
@mtojek mtojek marked this pull request as ready for review April 29, 2022 13:36
@mtojek mtojek requested a review from a team as a code owner April 29, 2022 13:36
@mtojek mtojek requested review from jsoriano and ruflin April 29, 2022 13:37
@mtojek mtojek requested a review from P1llus April 29, 2022 13:37
{{/if}}

{{#if pipeline}}
pipeline: {{pipeline}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this from here. Ok that this is only an example in the config, but it may look like if we support custom processing using pipelines, but is something that we are not doing in the MVP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we already support both processors and ingest pipelines in the current input packages, are we planning on removing them @jsoriano ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have settings in input-like packages that happen to support custom processing, but this is not completely integrated with the solution, and is not enforced to be coherent with other inputs, or to require field mappings for possible new fields that are generated by this processing.
If we follow this way, every input package needs to implement its own way to support custom processing and custom fields. Now that we start with a clean state for this new package type, we would like to provide a more integrated solutions for this.

Custom pipelines support is still an open discussion for packages in general.

We consider that inputs without any custom processing already provide value in use cases of centralized log collection, and we are currently planning to go in this direction for the initial MVP. Next steps after that would be towards supporting custom processing.

It is though a good question how we are going to migrate users from the current input-like integration packages to the new input packages. Depending on the answer we may need to maintain these settings for backwards compatibility, maybe deprecating them, and eventually removing them when we have a complete solution for this.

multi: false
required: true
show_user: false
default: "variables"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played with patterns (as regexp) here #245. I would prefer a list of allowed values :)

@mtojek mtojek merged commit 94f0a96 into elastic:main May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants