feat(new transform): Inital `kubernetes_pod_metadata` transform #1888

ktff · 2020-02-21T17:13:48Z

Closes #1072
Ref #1867

Implemetation notes:

Supported fields:
- name
- namespace
- creation_timestamp
- deletion_timestamp
- labels
- annotations
- node_name
- hostname
- priority
- priority_class_name
- service_account_name
- subdomain
- host_ip
- ip
Fields are added under log schema kubernetes_key. Which has default value kubernetes.
Tests and example config are using RBAC authorization to allow Vector access to API server. RBAC is usualy enabled by default, but this requirement should be documented .

Open questions

Is default value kubernetes best for log schema kubernetes_key ?
Is this a breaking change because log schema has been changed ? (Edit: Will not be with fix(config): Partial LogSchema #1923)
How to deal with Event interpreting label/annotation keys like "app.kubernetes.io/name" as nested?

Todo

After this is merged, open an issue to update kubernetes_source naming scheme to use kubernetes_key. This will be a breaking change.
Add delete timeout.
Add jitter.

binarylogic · 2020-02-22T00:49:45Z

@ktff I added #1867 as a closing issue here. Please remove if that is not the case, but it looks like it at least partially resolves it?

Signed-off-by: Kruno Tomola Fabro <[email protected]>

ktff · 2020-02-24T15:50:33Z

@binarylogic

but it looks like it at least partially resolves it?

Yes, this two:

Add a log_schema.kubernetes_key option

Use this key in the kubernetes_metadata transform

binarylogic · 2020-02-24T17:02:55Z

@ktff would you mind adding docs?

Also, do we have a setting that controls the refresh rate against the k8s API? I ask because I have spoken to users that have experienced issues with this with fluent*. For example, very large clusters (5000+ pods) effectively DDOS'd the k8s API. A configuration option should allow users to reduce the refresh rate.

ref fluent/fluent-bit#1399

ktff · 2020-02-25T21:31:23Z

@binarylogic good question. We are currently only proactively asking for metadata once and then passively waiting for api server to send us changes, so we don't have refresh rate, but we have something similar.

We are using watch endpoint to listen for metadata changes for pods on local node, so there are basically two states:

Connecting to api server.
Exposing the retry_timeout as configuration option would enable users to decrease the pressure on api server in large clusters. Also, because 1. is problematic when large number of Vectors try to connect at the same time, deploying Vector with this transform on large clusters at once is problematic.
To mitigate that, two more changes would help:
- Make timeout duration probabilistic. Uniform distribution on range [0,2*retry_timeout] would have largest deviation, a median of retry_timeout, and also be indirectly depended on the size of the cluster. This would help spread out connection retries through time which would spread our pressure on api server more evenly.
- Add the timeout even when trying to connect for the first time.
Waiting for changes from api server.
Shouldn't be problematic because it's either self regulating or an issue that needs to be solved with extra feature when it arises.

It sounds reasonable to me to add this as it's a small change with great benefits.

@LucioFranco what do you think?

Signed-off-by: Kruno Tomola Fabro <[email protected]>

LucioFranco

Looking good I left a few comments inline.

src/transforms/kubernetes/mod.rs

Signed-off-by: Kruno Tomola Fabro <[email protected]>

LucioFranco · 2020-02-26T15:51:15Z

So I have two main questions:

How does using the watch client affect us, this means that we will need to have a tcp connection open to the api server for the duration of the vector instance. I don't think this should be a large issue but I have seen other metadata transforms provide the option to use the watch or a poll based api.
I think the biggest problem with large scale deployments would be vector trying to hit the api server all at once. We should introduce some sort of jitter for the initial request and subsequent requests so we can avoid the stampeding herd problem.

I think this follow closely to what you said @ktff, what do you think?

Note: we may want to add jitter to our retry policy as well so that our other sinks can benefit from the same.

https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/?did=ba_card&trk=ba_card

ktff · 2020-02-26T21:02:39Z

@LucioFranco

Yep, we are thinking the same thing. I'll add the jitter. And we could open an issue for adding it to the retry policy, and possibly extract this implementation.

I suspect poll is useful for usecases when the used portion of metadata won't be changing, so only fetching it once makes sense. Other than that, I think we would need to wait and see. I also didn't found any Fluent issue regarding watch vs poll.

Signed-off-by: Kruno Tomola Fabro <[email protected]>

LucioFranco · 2020-02-26T21:14:16Z

@binarylogic do you have anymore info you cold provide us about the issues you were describing?

alexgavrisco · 2020-03-24T14:35:52Z

This PR adds everything we need tn order to go with Vector in EKS! Thanks a ton!
I'm going to test today Vector 0.8.x for a small cluster, however for production use we really need the ability to filter by pod_name/namespace etc.
Is there a tiny chance this feature will be released with 0.9? No pressure, I just want to set correct expectations on my end.

ktff · 2020-03-25T09:35:01Z

@Alexx-G Great!

If you are using Kubernetes v1.14 or higher, the kubernetes source will output pod_name and pod_namespace. Although those names will be changed with the coming 0.9 version which will most likely have this transform in it.

alexgavrisco · 2020-03-25T10:55:13Z

@ktff Thanks for update!
Yesterday I tested 0.8.x on an EKS cluster, and technically there's enough metainfo (contaienr_name, pod_name and namespace) so we can settle down with Vector. I tested it in a relatively small cluster, once it passes our validation phase, we'll see how it behaves for bigger clusters (100-200 worker nodes).
Since this feature makes API calls to kubernetes, the kubernetes deployment must have corresponding clusterrole bindings, right?

binarylogic · 2020-03-25T14:05:44Z

Thanks for all of your help @Alexx-G

Since this feature makes API calls to kubernetes, the kubernetes deployment must have corresponding clusterrole bindings, right?

@ktff if that is the case, woudl you mind updating https://github.com/timberio/vector/issues/1450 to include details on that. I'm working on the k8s docs now.

alexgavrisco · 2020-03-25T14:27:30Z

@binarylogic Thank you for your effort! I'd gladly offer more help.
Today I'll whitelist more services that Vector will collect logs from and I'll see how it goes. But from my previous tests I don't expect any issues there.
Once I finish integrating Vector, I'll see if I can contribute with some feedback to #1397 (since it should contain all resources vector needs to work properly in a Kubernetes cluster).

ktff · 2020-03-27T09:49:41Z

@Alexx-G

Since this feature makes API calls to kubernetes, the kubernetes deployment must have corresponding clusterrole bindings, right?`

This transform yes, and its clusterrole binding is in this PRs yaml.

@binarylogic

if that is the case, woudl you mind updating #1450 to include details on that. I'm working on the k8s docs now.

clusterrole bindings are a part of vector-daemonset.yaml, and besides having RBAC authorization enabled, they don't require other user intervention. It is documented in the yaml, with other such necessities, so just mentioning yaml as the goto for more in depth information should be enough.

RBAC authorization is documented in this transforms documentation as it's the one using it. source can do without it.

binarylogic · 2020-03-27T14:07:11Z

Thank you. That makes sense.

…rnetes_transform Signed-off-by: Kruno Tomola Fabro <[email protected]>

Signed-off-by: Kruno Tomola Fabro <[email protected]>

ktff · 2020-04-01T14:19:09Z

@binarylogic could you review the docs, and also I'm not sure how to fix the error in
make generate / ci/circleci: check-blog

binarylogic · 2020-04-01T14:22:48Z

Yep. I'm on it today. Once I finish the docs this is good to merge, correct?

Signed-off-by: binarylogic <[email protected]>

binarylogic

Docs are done! Looks really good.

Signed-off-by: binarylogic <[email protected]>

binarylogic · 2020-04-01T16:18:03Z

I am making a few more documentation changes and will merge afterwards.

LucioFranco · 2020-04-01T16:31:46Z

@ktff can we add a test around verifying labels are inserted correctly and all other values are inserted as we expect?

Signed-off-by: binarylogic <[email protected]>

binarylogic · 2020-04-01T17:42:20Z

Nice work @ktff and @LucioFranco.

marcbachmann · 2021-10-05T00:32:28Z

Did this get removed from master because of performance issues?

jszwedko · 2021-10-05T21:10:19Z

Hi @marcbachmann . This was removed since it is implemented by the kubernetes_logs source. We have discussed bringing it back in the future though, to separate the reading of logs and annotation of metadata.

ktff added the type: new feature label Feb 21, 2020

ktff added this to the Initial containers support milestone Feb 21, 2020

ktff mentioned this pull request Feb 21, 2020

New kubernetes_pod_metadata transform #1072

Closed

2 tasks

ktff requested review from Jeffail and lukesteensen as code owners February 21, 2020 17:34

binarylogic requested review from LucioFranco and removed request for lukesteensen and Jeffail February 23, 2020 14:41

binarylogic assigned LucioFranco Feb 23, 2020

binarylogic added the needs: docs Needs documentation updates label Feb 23, 2020

ktff mentioned this pull request Feb 24, 2020

fix: (kubernetes source) Change timestamp_key back to message_key #1887

Merged

ktff added 6 commits February 24, 2020 16:15

Add in cluster config load

d62ec55

Signed-off-by: Kruno Tomola Fabro <[email protected]>

Transform impl

99bc902

Signed-off-by: Kruno Tomola Fabro <[email protected]>

Update example yaml

95f5604

Signed-off-by: Kruno Tomola Fabro <[email protected]>

Cleanup

741192c

Signed-off-by: Kruno Tomola Fabro <[email protected]>

Update log schema tests

ad8886f

Signed-off-by: Kruno Tomola Fabro <[email protected]>

Fix

707f86b

Signed-off-by: Kruno Tomola Fabro <[email protected]>

ktff force-pushed the kubernetes_transform branch from c7bdc51 to 707f86b Compare February 24, 2020 15:29

Add Delete

c044571

Signed-off-by: Kruno Tomola Fabro <[email protected]>

LucioFranco reviewed Feb 26, 2020

View reviewed changes

Small changes

817e267

Signed-off-by: Kruno Tomola Fabro <[email protected]>

Return Result

1e38fcd

Signed-off-by: Kruno Tomola Fabro <[email protected]>

Merge branch 'master' into kubernetes_transform

3bb172a

Hoverbear mentioned this pull request Mar 25, 2020

Missing kubernetes labels on GH issues #2151

Closed

ktff added 3 commits April 1, 2020 11:04

Merge branch 'master' of https://github.com/timberio/vector into kube…

a7f6ab8

…rnetes_transform Signed-off-by: Kruno Tomola Fabro <[email protected]>

Update from master merger

656f836

Signed-off-by: Kruno Tomola Fabro <[email protected]>

Remove accidentaly readded during merger

14c996e

Signed-off-by: Kruno Tomola Fabro <[email protected]>

binarylogic added 2 commits April 1, 2020 10:29

Merge branch 'master' into kubernetes_transform

3229f9b

Update docs

f21dff2

Signed-off-by: binarylogic <[email protected]>

binarylogic approved these changes Apr 1, 2020

View reviewed changes

Add permission class

6562b0b

Signed-off-by: binarylogic <[email protected]>

Final docs updates

0d6f679

Signed-off-by: binarylogic <[email protected]>

binarylogic merged commit 9add2cb into master Apr 1, 2020

binarylogic deleted the kubernetes_transform branch April 1, 2020 17:42

ktff mentioned this pull request Apr 3, 2020

chore(kubernetes_pod_metadata transform): Add field tests #2213

Merged

alexgavrisco mentioned this pull request Apr 3, 2020

Initial AWS EKS support #814

Closed

binarylogic added type: feature A value-adding code addition that introduce new functionality. and removed type: new feature labels Jun 16, 2020

binarylogic removed this from the Initial Containers Support milestone Jul 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(new transform): Inital `kubernetes_pod_metadata` transform #1888

feat(new transform): Inital `kubernetes_pod_metadata` transform #1888

ktff commented Feb 21, 2020 •

edited

Loading

binarylogic commented Feb 22, 2020

ktff commented Feb 24, 2020 •

edited

Loading

binarylogic commented Feb 24, 2020 •

edited

Loading

ktff commented Feb 25, 2020

LucioFranco left a comment

LucioFranco commented Feb 26, 2020 •

edited

Loading

ktff commented Feb 26, 2020

LucioFranco commented Feb 26, 2020

alexgavrisco commented Mar 24, 2020

ktff commented Mar 25, 2020

alexgavrisco commented Mar 25, 2020 •

edited

Loading

binarylogic commented Mar 25, 2020

alexgavrisco commented Mar 25, 2020

ktff commented Mar 27, 2020 •

edited

Loading

binarylogic commented Mar 27, 2020

ktff commented Apr 1, 2020 •

edited

Loading

binarylogic commented Apr 1, 2020

binarylogic left a comment

binarylogic commented Apr 1, 2020

LucioFranco commented Apr 1, 2020

binarylogic commented Apr 1, 2020

marcbachmann commented Oct 5, 2021

jszwedko commented Oct 5, 2021

feat(new transform): Inital kubernetes_pod_metadata transform #1888

feat(new transform): Inital kubernetes_pod_metadata transform #1888

Conversation

ktff commented Feb 21, 2020 • edited Loading

Implemetation notes:

Open questions

Todo

binarylogic commented Feb 22, 2020

ktff commented Feb 24, 2020 • edited Loading

binarylogic commented Feb 24, 2020 • edited Loading

ktff commented Feb 25, 2020

LucioFranco left a comment

Choose a reason for hiding this comment

LucioFranco commented Feb 26, 2020 • edited Loading

ktff commented Feb 26, 2020

LucioFranco commented Feb 26, 2020

alexgavrisco commented Mar 24, 2020

ktff commented Mar 25, 2020

alexgavrisco commented Mar 25, 2020 • edited Loading

binarylogic commented Mar 25, 2020

alexgavrisco commented Mar 25, 2020

ktff commented Mar 27, 2020 • edited Loading

binarylogic commented Mar 27, 2020

ktff commented Apr 1, 2020 • edited Loading

binarylogic commented Apr 1, 2020

binarylogic left a comment

Choose a reason for hiding this comment

binarylogic commented Apr 1, 2020

LucioFranco commented Apr 1, 2020

binarylogic commented Apr 1, 2020

marcbachmann commented Oct 5, 2021

jszwedko commented Oct 5, 2021

feat(new transform): Inital `kubernetes_pod_metadata` transform #1888

feat(new transform): Inital `kubernetes_pod_metadata` transform #1888

ktff commented Feb 21, 2020 •

edited

Loading

ktff commented Feb 24, 2020 •

edited

Loading

binarylogic commented Feb 24, 2020 •

edited

Loading

LucioFranco commented Feb 26, 2020 •

edited

Loading

alexgavrisco commented Mar 25, 2020 •

edited

Loading

ktff commented Mar 27, 2020 •

edited

Loading

ktff commented Apr 1, 2020 •

edited

Loading