Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Beat "presets" #3190

Closed
sebgl opened this issue Jun 5, 2020 · 14 comments
Closed

Proposal: Beat "presets" #3190

sebgl opened this issue Jun 5, 2020 · 14 comments
Assignees
Labels
>enhancement Enhancement of existing functionality

Comments

@sebgl
Copy link
Contributor

sebgl commented Jun 5, 2020

I'd like to discuss here a proposal to change a bit the logic of the current Beat CRD.

Concerns

I want to address the following concerns:

  • if you want to just grab logs from a mounted /data/logs directory with filebeat, you end up by default with many additional things and permissions you don't need:
    • a dedicated service account whose token is mounted in the pod
    • a cluster role binding for that service account that can access the k8s API
    • HostNetwork = true
    • a security context to run with the root user (id 0)
    • system volumes mounted in:
      • /var/lib/docker/containers
      • /var/log/containers
      • /var/log/pods
  • similarly with metricbeat, if you just want to grab metrics from a prometheus http endpoint, you still end up with:
    • similar RBAC permissions, service account, root user, host network
    • system volumes mounted in:
      • /var/run/docker.sock
      • /proc
      • /sys/fs/cgroup
  • some of those defaults can be overridden in the podTemplate, but some of them are harder to override. For example: the default volumes we create for Filebeat and Metricbeat (eg. /proc) if they don't want them.
  • with metricbeat we default to a daemonset collecting metrics from all hosts, but we could also have chosen to default to a deployment collecting kube-state-metrics (both are listed in metricbeat doc), or default to a deployment collecting prometheus endpoints
  • the story around config merging is complicated

Proposal

  • keep defaults very light (no mounted volumes, no privilege, no RBAC, no config) for filebeat and metricbeat, similar to what we do for all generic "other beat", treating all beats the same. We only set the ES output config iff elasticsearchRef is specified, but don't set any other config setting:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  daemonSet:
    podTemplate:
      # mount whatever volume you need here and add privileges if needed
  config:
    # whatever you need here, we just add the es output to it
  • Introduce "presets" to cover default podTemplate, RBAC and configuration for a small set of identified use cases

filebeat-k8s-autodiscover: collect hint-based logs from all Pods, autodiscovered from the apiserver (also handles service account, role binding, etc.)

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  preset: filebeat-k8s-autodiscover

filebeat-k8s-containers: collect logs from all containers - does not rely on autodiscover from the apiserver

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  preset: filebeat-k8s-containers

metricbeat-k8s-hosts: collect metrics from all k8s nodes

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  preset: metricbeat-k8s-hosts

metricbeat-k8s-cluster-metrics: collect metrics from the k8s cluster (in a single metricbeat instance)

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  preset: metricbeat-k8s-cluster-metrics
  • the podTemplate of all those presets can of course be overridden

  • if you want to include a lot of modifications it's probably easier to not use the preset and set your own podTemplate, config, and RBAC stuff

  • it gives us room to later on include other presets (for example, packetbeat) while keeping the same packetbeat default. Otherwise, the day we decide to set a packetbeat default means we'd break compatibility wit existing packetbeats resources.

  • I think we can tackle the config merging problem in case a preset is used, by allowing to override a few identified array items. Keep our current "merge-append" behaviour to be consistent with other CRDs, but allow some specific overrides (typed in the CRD):

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  preset: metricbeat-k8s-hosts
  configOverrides:
    modules:
    # system modules from the preset removed (user-provided ones are appended instead)
    - system
    # kubernetes modules from the preset removed (no user-provided one)
    - kubernetes
    autodiscover.providers.type:
    # kubernetes provider from the preset removed (user-provided ones appended instead)
    - kubernetes
  config:
    # the usual append-merge behaviour
    modules:
    - module: system # replace ours since specified in `configOverrides.modules`
      period: 5s
      metricsets:
      - cpu
      - load
    - module: system # replace ours since specified in `configOverrides.modules`
      period: 1m
      metricsets:
      - filesystem
    # append-merge this new setting
    setup.dashboards.enabled: true
  • in case no preset is used, there's no problem with using the config merge-append behaviour since our config is empty anyway (just contains the ES output if elasticsearchRef is specified)
@sebgl sebgl added the >enhancement Enhancement of existing functionality label Jun 5, 2020
@david-kow
Copy link
Contributor

Thanks for the write up @sebgl! My 2 cents below.

I think the concerns you listed can be divided into two groups:

Defaults

Default config comes with podTemplate that is not suiteable if a different config is used.

To have Beats working there are many moving parts, but I think that's just in the nature of the problem. Still, many of the things listed in the issue will be addressed:

If we still feel that's not enough if the users wants to do the setup themselves, we could provide a flag (in daemonSpec and deployment or at the top level) that would turn off all defaults and cause plain user podTemplate to be used.

Config merging is complicated

I think this is orthogonal to presets. We can have the proposed configOverrides with or without presets and we can have presets with or without configOverrides. I'd propose to tackle this separately. This solution has similar issues as some other ones we discussed in #3042.


IIUC, the proposal is about providing a number of default configs per Beat to address different use cases. It was considered already (although in a bit different form):

We could have new CRDs that map directly to covered scope items that were defined above, ie. have one CRD for cluster monitoring and another one for application monitoring. This seems to enable us to default Beats configs in a way that doesn’t require a lot customization to work, ie. “almost empty” manifest would default to enable all pieces to achieve Kubernetes cluster monitoring. On the other hand this is a very different route from one we took with existing CRDs. It seems that making this customizable would be difficult too.

I do see couple more issues:

  • how do we pick the use cases to support and how we know in which direction we should make it customizable? I'm afraid that it will be difficult to come up with configurations that suit everyone and if they need to be altered in any way users are again in the territory of looking at the defaults, diffing and trying to figure out the right merge/override

  • do we know if this is going to be a preferable way of running Beats? We would be making a big promise (that it works and will be supported going forward) and I'm not sure if this is something desirable by many users

  • I think maintaining a single default per Beat is already a stretch and having more will just grow test matrix. Since this would be a field in the CRD with well-known values it would be expected to work across all dimensions we already have today. This seems to me, could be a big maintainability burden with beats x presets x k8svers x stackvers x secured/nonsecured x ocp/nonocp. We would be in the bussiness of maintaining number of Beats configs similar to (larger then?) the Beats team itself.

Maybe we can do something a little bit different and achieve the same goals. The way I see it, a preset is a beatConfig+podTemplate for a particular Beat type. Which is what Beat CR itself is :)

Could we get similar UX by just providing recipes with Beat CR for different scenarios? This would be similar to current recipes (not production ready, might require hardening, we can change them without figuring out upgrade story, etc.). Users could use it as a starting point and tune it as needed, which is what we expect would happen with presets too.

We can also decide to postpone adding this until we get some feedback. It seems it can be added at any time.

@anyasabo
Copy link
Contributor

anyasabo commented Jun 5, 2020

I think we can tackle the config merging problem in case a preset is used, by allowing to override a few identified array items

In your proposal I think it might also make sense to make the config and preset fields mutually exclusive. You either use our presets or you provide the whole config (minus the output I suppose).

do we know if this is going to be a preferable way of running Beats? We would be making a big promise (that it works and will be supported going forward) and I'm not sure if this is something desirable by many users

I don't quite understand this issue since we are already committing to this for each beat default in the current proposal.

hostNetwork is arguably a good default - elastic/beats#15013 , fairly easy to switch off if needed

It's not clear to me that this is actually a good default, it seems like using the downward API to get the node name provides the same thing without any of the downsides / permissions issues.

@sebgl
Copy link
Contributor Author

sebgl commented Jun 8, 2020

Summary of a discussion we had with @david-kow @anyasabo @pebrc:

  • presets can be seen as opt-in explicit defaults, as opposed to implicit defaults in the current implementation
  • the main benefit is to more easily be able to specify config and podTemplate for some use cases completely different than the main default (eg. metricbeat where you want to poll an http endpoint, instead of collecting host-level metrics). In which case opting-out from the default could be hard (volume mounts, rbac, privileges, etc.).
  • we don't want to commit to implement (too) many presets for each beat. 2 presets for metricbeat (daemonset for host metrics & deployment for k8s metrics) and 1 for filebeat (autodiscover pod logs) sounds like a good starting point. The risk is to slowly move towards a helm approach where there are many templates and parameters users can pass.
  • there is still value in users being able to override completely the config for a given preset. For example with the metricbeat preset they may be happy to keep the existing rbac & volume mount stuff, but just want to override one tiny detail in the config.
  • as @david-kow said, the config overrides problem is orthogonal to the presets idea. We may want to implement it at some point but for now it seems fair that the user either relies on the preset config, either provides its own config entirely (no overrides, except the ES output).

@pebrc
Copy link
Collaborator

pebrc commented Jun 8, 2020

The minimum prerequisite for introducing this feature would be to remove any existing defaults, volume mounts and RBAC settings from #3041 (or a quick follow-up PR).

We can then in a separate decision introduce the presets (or play around with sample configurations external to the controller)

It would also be helpful to ask for feedback for the CRD and this idea in particular from @\elastic/integrations-platforms

@sebgl
Copy link
Contributor Author

sebgl commented Jun 8, 2020

The minimum prerequisite for introducing this feature would be to remove any existing defaults, volume mounts and RBAC settings from #3041 (or a quick follow-up PR).
We can then in a separate decision introduce the presets (or play around with sample configurations external to the controller)

If we remove this default then the Beat CRD is basically useless or very hard to use?
I think I'd rather keep it and make a future preset PR refactor it into an opt-in preset instead.

@pebrc
Copy link
Collaborator

pebrc commented Jun 8, 2020

If we remove this default then the Beat CRD is basically useless or very hard to use?
I think I'd rather keep it and make a future preset PR refactor it into an opt-in preset instead.

You could have example configs to go with it. I think I am OK with removing the defaults together with the preset PR if it is before an initial release. Once we release the Beats CRD even if it is still alpha, removing the defaults would be a breaking change, that we should avoid if possible.

@anyasabo
Copy link
Contributor

anyasabo commented Jun 8, 2020

I think I agree with pebrc.

I am still wrestling with the value of having the presets built in to the controller, vs starting out with having an example library of common use cases in our docs and potentially adding presets later. My inclination is to start smaller with a basic beat. Adding things is easier than taking them away in the future.

It also makes me wonder if stack monitoring would still make sense as a separate CRD or if it makes sense to include as a preset (either in documentation or codified).

@david-kow
Copy link
Contributor

I agree on removing the defaults, but I think we should have presets (or some other way?) to specify a working config.

All our other CRDs can have great startup experience with 10-liners specifying all that's needed for simple quickstart. Meanwhile, Beat will need a fairly complicated, unreadable 50-100-150 lines of config which we will probably end up hiding behind kubectl apply -f https://....

Also, having preset (or any other built-in config mechanism) together with configRef enables the workflow where users starts with the default and have an editable Secret created by the operator.

@jsoriano
Copy link
Member

jsoriano commented Jun 9, 2020

Hi, sorry if I am joining late to this discussion as my thoughts are going to be a bit against "presets" 😅 Also I am not familiar with current operator, so I may be missing some basic things that may have already been discussed.

I would like to position my comments starting on the deploy manifests we currently maintain in the Beats repo (here), and how they use to be used. They use to be used as they are, and customized as needed following some comments in the same yaml file. One way of supporting beats in the operator could be to move the knowledge in these manifests and its comments to the controller.

For example one thing we comment in the filebeat manifest is that to use autodiscover, you need to remove filebeat.inputs configuration and uncomment the filebeat.autodiscover block. I wonder if this could be modeled with a setting:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  autodiscover: true

Other usual thing to customize is the logs path, this could be another setting:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  autodiscover: true
  containers_logs:
    path: "/var/log/containers/"

Another common thing to do is to enable or disable some things if the beat is deployed in a secured kubernetes, like OpenShift. For example filebeat needs to be privileged to read the logs in some environments. Maybe the operator is able to dectect this and set the proper options, if not, a setting for this may be needed:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  autodiscover: true
  containers_logs:
    path: "/var/log/containers/"
  privileged: true

Only for the combinations of this three quite popular variations we would need lots of presets, but they can be modelled as some clear options. Support for these options could be added on different iterations.
I see the risk of ending up with too many parameters and templates, but we could support only a small set of well-known opinionated use cases, and not all the possible configurations of Beats. We can even prevent to use certain error-prone combinations, like using autodiscover templates and hints-based autodiscover at the same time.

if you want to just grab logs from a mounted /data/logs directory with filebeat,

A case like this could be covered with options like this:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  autodiscover: false
  containers_logs:
    enabled: false
  inputs:
    - type: log
      paths:
        - /data/logs

The user could specify the path as it is in the host, and the operator could then mount /data/logs as a volume in the filebeat container, and generate a config for filebeat with only an input to the directory to the correct path. It could use the other options to decide if it has to give more or less privileges and mount or not other volumes.

One particularity that metricbeat has is that we recommend deploying for it a daemonset and a deployment. The daemonset is intended to collect metrics locally on each node, and the deployment to collect cluster metrics from kube state metrics. A config like the following one could generate only a daemonset config with hints-based autodiscover, and nothing for kube state metrics:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  autodiscover:
    hints: true
  kube_state_metrics: false

Also, something that may have been discussed in the past, I wonder if we have considered to have a kind of CRD for each well supported beat, so we could have specific options for specific beats (e.g. inputs only makes sense on filebeat).

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Filebeat
spec:
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  autodiscover: false
  containers_logs:
    enabled: false
  inputs:
    - type: log
      paths:
        - /data/logs

We could still have the Beats CRD if needed for completely custom configurations. The problem I see with offering completely custom configuration is that users may wonder why using an operator if they still need to provide whole configs.

@david-kow
Copy link
Contributor

Thanks for your input @jsoriano, the timing is perfect :)

I see the value in making Beat config more customizable from the CRD itself. It would allow us to address certain percentage of common setups with short and simple configs. At the same time it requires us to draw arbitrary line between what's easily customizable and what isn't. I'm not sure if that should be something that we do with Beats, but we didn't with any other CRDs we have.

For context, having a generic Beat CRD was driven by the desire to support all Beats at once (including community Beats) instead of (probably limited) number of very similar
CRDs/controllers. Furthermore, I've spent some time thinking about "lifting" Beat config options to CRD spec and my biggest concerns at the time were:

  • we end up with a big bag of settings that, depending on the Beat type might: be ignored, conflict with each other, depend on each other. I don't think we could easily make obvious
    what works together and what doesn't
  • fields like privileged or container_logs would have two sources of truth: podTemplate and the CRD. So far we tried to avoid being in that situation
  • some fields would have no meaning for some Beats, eg:
    -- no autodiscover for journal, packet, audit, heartbeats (at least not documented)
    -- kube_state_metrics only for metricbeat
  • it would require us to make arbitrary choices what to support. How do we determine what is worthy to put in the CRD directly? What would be our answer to 'why setting x
    is not in the CRD'?
  • expanding support could be a rabbit hole
  • making sure all combinations work across all Beats as expected in all stack vers, k8s vers, k8s flavours seems like a challenge and would likely be very resource consuming

The above is mostly to give some more context why we chose this path initially :)

The problem I see with offering completely custom configuration is that users may wonder why using an operator if they still need to provide whole configs.

I think this is very valid question. Even if the user provides whole Beat config, operator takes care of:

  • setting the output to ES
  • seting up default Kibana dashboards
  • cert rollover
  • pods rollover in case of config/version change
  • putting Secrets into Beat keystore

And this is on top of having a "configless" start up experience if they chose to use the preset (if we decide to have one).

I wonder if this "simplified" config approach was ever considered on the Beat config level (simple.autodiscover: true would translate to some default autodiscover setup, etc.).

@jsoriano
Copy link
Member

jsoriano commented Jun 10, 2020

  • fields like privileged or container_logs would have two sources of truth: podTemplate and the CRD. So far we tried to avoid being in that situation

+1 to avoid multiple sources of truth. If we support a limited number of options in the CRD, we probably shouldn't allow to add custom podTemplates. Maybe a solution for that could be to have a generic Beats CRD that only supports podTemplates, and a set of beats-specific CRDs that don't support podTemplates, but support the specific higher-level options.

  • we end up with a big bag of settings that, depending on the Beat type might: be ignored, conflict with each other, depend on each other. I don't think we could easily make obvious
    what works together and what doesn't

  • some fields would have no meaning for some Beats, eg:
    -- no autodiscover for journal, packet, audit, heartbeats (at least not documented)
    -- kube_state_metrics only for metricbeat

Wouldn't these points talk in favour of having specific CRDs for each supported beat? So if autodiscover is only supported in Filebeat and Metricbeat, we only support the autodiscover option in the CRDs for these beats.

  • it would require us to make arbitrary choices what to support. How do we determine what is worthy to put in the CRD directly? What would be our answer to 'why setting x
    is not in the CRD'?

  • expanding support could be a rabbit hole

  • making sure all combinations work across all Beats as expected in all stack vers, k8s vers, k8s flavours seems like a challenge and would likely be very resource consuming

Yeah, this can be true, but I guess this would be a decision on the use cases we want to support on the operator. I think it can make sense to support only a limited number of options. Beats options combinatory is wide, and some possible combinations are error-prone (I am thinking in autodiscover hints vs. templates for example, but also what beats make sense as deployments, as daemonsets, or as both). I think we don't need to support all combinations for the Kubernetes use case.

I think this is very valid question. Even if the user provides whole Beat config, operator takes care of:

  • setting the output to ES
  • seting up default Kibana dashboards
  • cert rollover
  • pods rollover in case of config/version change
  • putting Secrets into Beat keystore

Oh, this is great 👍

I wonder if this "simplified" config approach was ever considered on the Beat config level (simple.autodiscover: true would translate to some default autodiscover setup, etc.).

We try to provide defaults to cover basic use cases, so configuration can be simpler, for example enabling hints-based autodiscover in filebeat enables collection of logs for all pods, this works with kubernetes and docker providers:

filebeat.autodiscover.providers:
  - type: kubernetes
    hints.enabled: true

This can be done in Filebeat because all logs are collected the same, but it wouldn't work on Metricbeat, where each container is going to need a different way of collecting metrics.

Another example would be the add_*_metadata processors, they are not needed when autodiscover is used because it adds the same metadata. Also add_cloud_metadata autodetects the cloud provider so it doesn't need to be explitely set.

@david-kow
Copy link
Contributor

I think it comes down to how much product specific complexity we can handle. I think you are right saying that many of the problems that I mentioned can be solved by having a separate kind for each Beat and a generic one for community Beats. But other than the benefits it also introduces another layer of abstraction which is fine by itself, but requires users to learn and understand it.

We've looked at all the options (implicit defaults, explicit presets, no defaults) and decided to start with no defaults. We had already some people +1'ing on that approach. We will see what is the broader feedback and consider improvements based on that.

You can see the gist for how full Filebeat/Metricbeat examples would look like.

@roncohen
Copy link

super late to this and far from a kubernetes expert but: I understand the reasoning for going no defaults here. I still like the idea of presets. Could we instead implement them just as documentation? "Cookbook" examples of how to get started with the operator and a specific use case, e.g. collect logs from all containers and then supply an example config file they can just "apply" to get up and running with that use case? In short, a one step guide to collecting logs from all containers, one step to collect metrics from hosts, etc.

@pebrc
Copy link
Collaborator

pebrc commented Jun 17, 2020

Could we instead implement them just as documentation?

That's the plan for the time being. We already have a collection of recipes that we want to update for the Beats CRD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

No branches or pull requests

6 participants