Add sample "input" packages #325

mtojek · 2022-04-27T15:07:46Z

This PR is a follow-up on the design document.

I will start with the sample package and once we agree on properties, I will write down the spec.

mtojek · 2022-04-27T15:08:37Z

test/packages/custom_logs/manifest.yml

+categories:
+  - custom
+policy_templates:
+  - name: first_policy_template


I didn't put the elasticsearch block here. Should I add few/all options from here?

Yeah, I wouldn't see a reason not to support the same options. Unless there is something we want to deprecate from the data stream manifests.

So, we have:

elasticsearch.index_template.settings
elasticsearch.index_template.mappings
elasticsearch.index_template.ingest_pipeline.name
elasticsearch.privileges.indices

As we don't know in advance what would be the target index/ingest pipeline, we can't set these properties:
elasticsearch.index_template.ingest_pipeline.name
elasticsearch.privileges.indices

Am I right?

As we don't know in advance what would be the target index/ingest pipeline, we can't set these properties:
elasticsearch.index_template.ingest_pipeline.name
elasticsearch.privileges.indices

Umm, I don't see these are used in current integrations repo. What are use cases for these options in integration packages? Could they apply in input packages?

In any case, maybe you are right with not adding the elasticsearch block by now, so we start small and we see later possible use cases.

Umm, I don't see these are used in current integrations repo. What are use cases for these options in integration packages? Could they apply in input packages?

If you can find any package here, then most likely it's APM or Endpoint.

elasticmachine · 2022-04-27T15:10:27Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-04-29T13:23:28.425+0000
Duration: 3 min 33 sec

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

ruflin

++ on having 2-3 sample packages to derive the spec.

ruflin · 2022-04-28T07:16:49Z

test/packages/custom_logs/manifest.yml

+        multi: true
+        required: true
+        show_user: true
+        default:


I know this is a sample package but I wonder if an input package should even have defaults here?

I understand that "defaults" can limit the amount of work on the initial configuration. There might be cases where you don't know what to do and would like to kick off the ingesting pipeline with default settings.

I would keep support for default, but I wouldn't put any default paths in the package for custom logs. I think that we can expect that a user knows at least what logs they want to collect :)

An example of a default in a log input package could be to have a variable to support ignore_older, that would default to 0.

This is not real Custom logs package. I rather wanted to present all options there, which seem to be valid in terms of spec.

Well, you can have paths as example without defaults, and ignore_older as example with defaults 🙂

ruflin · 2022-04-28T07:32:30Z

test/packages/custom_logs/manifest.yml

+license: basic
+categories:
+  - custom
+policy_templates:


Do policy templates make sense for input packages? It is really convenient that we can reuse the same logic as we already have so I'm not necessarily asking to change anything here but have a discussion.

Can an input package have more then one policy_templates?

Does an input package need all features of policy_templates?

Does an input package need all features of policy_templates?

It looks like we can improve on this. These are the properties available in integration's policy_template:

name

title

categories

description

data_streams

inputs

multiple (?)

icons

screenshot

Clearly, we don't need all of them, so I assumed that this policy_template (a different one in terms of spec) will contain a subset (or a different set) of properties. If you consider a different name for policy_template, do you have any suggestions? maybe an input_template?

Can an input package have more then one policy_templates?

Good question, but I believe that the answer might be the same as for policy_templates. In most cases, we will need one. What should we do in the future if we scope this for one only :)?

Does the spec allow to use the same name but for example if it is defined as input package, to not allow certain params?

There will be totally new spec files here, so yes. We may consider extracting some common parts or referring to the integration type temporarily (for example, to fields properties).

Once we agree on the look, I will write down the spec files. I don't want to iterate on both at the same time as it usually means more work.

jsoriano

Nice.

test/packages/custom_logs/fields/input.yml

jsoriano · 2022-04-28T11:53:06Z

test/packages/custom_logs/manifest.yml

+categories:
+  - custom
+policy_templates:
+  - name: first_policy_template


Yeah, I wouldn't see a reason not to support the same options. Unless there is something we want to deprecate from the data stream manifests.

jsoriano · 2022-04-28T11:55:29Z

test/packages/custom_logs/manifest.yml

+        multi: true
+        required: true
+        show_user: true
+        default:


I would keep support for default, but I wouldn't put any default paths in the package for custom logs. I think that we can expect that a user knows at least what logs they want to collect :)

An example of a default in a log input package could be to have a variable to support ignore_older, that would default to 0.

mtojek · 2022-04-29T08:15:28Z

test/packages/sql_input/manifest.yml

+        multi: false
+        required: true
+        show_user: false
+        default: "variables"


For such cases, it would be great to have a pattern or allowed values. cc @joshdover

I played with patterns (as regexp) here #245. I would prefer a list of allowed values :)

mtojek · 2022-04-29T09:37:00Z

Do you think that we need more sample packages or is it enough to start drafting spec files?

P1llus · 2022-04-29T10:11:39Z

test/packages/custom_logs/agent/stream/input.yml.hbs

+{{#each tags}}
+  - {{this}}
+{{/each}}
+


I think there is a few minimum additions to this, that needs to be in every input package, and many of them are in other integrations as well:

Pipelines:

{{#if pipeline}} pipeline: {{pipeline}} {{/if}}

Tags are slightly wrong, they should be like this:

{{#if tags}} tags: {{#each tags as |tag i|}} - {{tag}} {{/each}} {{/if}}

Since an input can ingest data both from a local host and external host, we need to add support for removing the host fields:

{{#contains "forwarded" tags}} publisher_pipeline.disable_host: true {{/contains}}

And custom processors:

{{#if processors}} processors: {{processors}} {{/if}}

Custom processing has been intentionally left out of the MVP, it would also require custom mappings.

Also, depending on how this is implemented, this wouldn't require anything on the spec, the pipelines and the mappings could be configured in Fleet and installed by it.

I added a few of your ideas to the sample package.

test/packages/custom_logs/manifest.yml

P1llus · 2022-04-29T10:13:27Z

test/packages/custom_logs/manifest.yml

+    description: Collect your custom log files.
+    input: logfile
+    template_path: input.yml.hbs
+    vars:


Since we have all the settings here, does that mean we won't have any datastream at all?

Any data stream directories, but with Kibana UI you will be able to select/create the target data stream.

Co-authored-by: Marius Iversen <[email protected]>

ruflin · 2022-04-29T13:04:49Z

Should we add one input package that has an open endpoint like http as an additional example? https://github.com/elastic/package-storage/tree/production/packages/http_endpoint/1.0.1 @andrewkroh Would be good to get your eyes also on this.

mtojek · 2022-04-29T13:15:47Z

Should we add one input package that has an open endpoint like http as an additional example? https://github.com/elastic/package-storage/tree/production/packages/http_endpoint/1.0.1

I think the more useful would be httpjson and I'm working on that one.

ruflin · 2022-04-29T13:35:20Z

httpjson is useful too. The reason I mentioned the http_endpoint is that it is different from the others as it is a push instead of pull endpoint.

mtojek · 2022-04-29T13:36:19Z

Ok, I think that 3 packages are sufficient to show our intention with input packages. From a developer perspective, I would say that the biggest difference is in immersing the data stream manifest in the package manifest. Apart from that, this is a relatively similar type to the integration.

I will keep the PR hanging until next week to collect more comments, then will focus on spec files.

jsoriano · 2022-04-29T15:04:55Z

test/packages/custom_logs/agent/stream/input.yml.hbs

+{{/if}}
+
+{{#if pipeline}}
+pipeline: {{pipeline}}


I would remove this from here. Ok that this is only an example in the config, but it may look like if we support custom processing using pipelines, but is something that we are not doing in the MVP.

But we already support both processors and ingest pipelines in the current input packages, are we planning on removing them @jsoriano ?

We have settings in input-like packages that happen to support custom processing, but this is not completely integrated with the solution, and is not enforced to be coherent with other inputs, or to require field mappings for possible new fields that are generated by this processing.
If we follow this way, every input package needs to implement its own way to support custom processing and custom fields. Now that we start with a clean state for this new package type, we would like to provide a more integrated solutions for this.

Custom pipelines support is still an open discussion for packages in general.

We consider that inputs without any custom processing already provide value in use cases of centralized log collection, and we are currently planning to go in this direction for the initial MVP. Next steps after that would be towards supporting custom processing.

It is though a good question how we are going to migrate users from the current input-like integration packages to the new input packages. Depending on the answer we may need to maintain these settings for backwards compatibility, maybe deprecating them, and eventually removing them when we have a complete solution for this.

jsoriano · 2022-04-29T15:10:42Z

test/packages/sql_input/manifest.yml

+        multi: false
+        required: true
+        show_user: false
+        default: "variables"


I played with patterns (as regexp) here #245. I would prefer a list of allowed values :)

mtojek added 2 commits April 27, 2022 16:25

Basic file structure

f8448fa

Add properties

e1c2f0b

mtojek requested review from ruflin and joshdover April 27, 2022 15:07

mtojek self-assigned this Apr 27, 2022

mtojek commented Apr 27, 2022

View reviewed changes

Fix: link

445f5d7

ruflin reviewed Apr 28, 2022

View reviewed changes

mtojek requested a review from jsoriano April 28, 2022 09:51

jsoriano reviewed Apr 28, 2022

View reviewed changes

mtojek added 2 commits April 29, 2022 09:43

Prepare SQL input

aa17a16

Fix: response formats

333aee9

mtojek commented Apr 29, 2022

View reviewed changes

ruflin mentioned this pull request Apr 29, 2022

[Change Proposal] Ability to add privileges for all data streams of a specific type #315

Closed

Adjust custom logs

4365099

P1llus reviewed Apr 29, 2022

View reviewed changes

mtojek and others added 2 commits April 29, 2022 13:09

Address PR comments

268c69b

Update test/packages/custom_logs/manifest.yml

132b3d1

Co-authored-by: Marius Iversen <[email protected]>

Add httpjson_input

fb552ca

mtojek changed the title ~~Describe new package type: input~~ Add sample "input" packages Apr 29, 2022

mtojek marked this pull request as ready for review April 29, 2022 13:36

mtojek requested a review from a team as a code owner April 29, 2022 13:36

mtojek requested review from jsoriano and ruflin April 29, 2022 13:37

mtojek requested a review from P1llus April 29, 2022 13:37

jsoriano approved these changes Apr 29, 2022

View reviewed changes

mtojek merged commit 94f0a96 into elastic:main May 4, 2022

This was referenced May 4, 2022

Write input package spec #328

Merged

Support “input” package type #319

Closed

Add sample "input" packages #325

Add sample "input" packages #325

Conversation

mtojek commented Apr 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoriano Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Apr 27, 2022 • edited Loading

💚 Build Succeeded

Build stats

🤖 GitHub comments

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoriano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Apr 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Apr 29, 2022

mtojek commented Apr 29, 2022

ruflin commented Apr 29, 2022

mtojek commented Apr 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoriano Apr 29, 2022 •

edited

Loading

elasticmachine commented Apr 27, 2022 •

edited

Loading