Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic pipelines - a new foreach block #1480

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Conversation

ptodev
Copy link
Collaborator

@ptodev ptodev commented Aug 15, 2024

PR Description

This is a proposal for starting up Alloy pipelines dynamically based on data in transit. For example, if a discovery component comes up with 10 targets, Alloy can start 10 sub-pipelines, each dealing with a different target.

The PR consists of a a design doc in docs/design/ and an experimental implementation. The user will have to have the "experimental" cmd flag turned on.

Which issue(s) this PR fixes

Fixes #1443

@captncraig
Copy link
Contributor

I really like the idea of it being component based. I am comparing to terraform's for_each, where you make a "template" resource and have an argument in there which tells it how to expand it into multiple components:

resource "azurerm_resource_group" "rg" {
  for_each = tomap({
    a_group       = "eastus"
    another_group = "westus2"
  })
  name     = each.key
  location = each.value
}
  • for_each can be any expression that evaluates to something iterable (in our case limiting to an array would probably be fine, but maybe we could do maps in a similar way).
  • The each keyword is valid in other arguments to reference the "current" item for iteration.

I really like the notion that this is a single component that would be "expanded" based on the evaluation of some list. It could be a built-in component or a previously imported dynamic component.

I am not sure how naming of the expanded components would work, but we could figure something out. I'm also not sure if it is possible to reference a dynamically created subcomponent from elsewhere in the config.

docs/design/1443-dynamic-pipelines.md Show resolved Hide resolved
docs/design/1443-dynamic-pipelines.md Show resolved Hide resolved
docs/design/1443-dynamic-pipelines.md Show resolved Hide resolved
docs/design/1443-dynamic-pipelines.md Show resolved Hide resolved
docs/design/1443-dynamic-pipelines.md Show resolved Hide resolved
docs/design/1443-dynamic-pipelines.md Show resolved Hide resolved
docs/design/1443-dynamic-pipelines.md Show resolved Hide resolved
docs/design/1443-dynamic-pipelines.md Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Oct 7, 2024

This PR has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If you do not have enough time to follow up on this PR or you think it's no longer relevant, consider closing it.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your PR will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@ptodev
Copy link
Collaborator Author

ptodev commented Oct 29, 2024

It looks likely that we are going to go with the foreach option. I updated the PR to add some draft code that I've been working on. As a first step, it'd be good to have a unit tests that doesn't use the proposed var argument. The test could look something like what's proposed in internal/runtime/testdata/foreach/foreach_1.txtar.

As a second step, we will need to figure out how to refer to the values that are currently iterated on. Should we actually have a var argument? I think it might be unnecessary - having a predefined argument (e.g. val) which refers to the current thing being iterated on may be enough. I'm not yet sure how to add this reserved word into the Alloy syntax though.

By the way, apparently the Collector already has something like dynamic pipelines. There are observer extensions which you can hook up to a recover creator component. For example, you can have a k8s observer which crates a receiver for each discovered pod.

@ptodev ptodev force-pushed the ptodev/dynamic-pipelines branch from dd602e6 to 378cb04 Compare October 31, 2024 19:35
@ptodev ptodev force-pushed the ptodev/dynamic-pipelines branch from 378cb04 to cc02bfc Compare November 20, 2024 09:43
@ptodev ptodev force-pushed the ptodev/dynamic-pipelines branch from d8b2547 to d1591b1 Compare December 16, 2024 10:37
@wildum wildum force-pushed the ptodev/dynamic-pipelines branch from e8cd817 to 8217e86 Compare December 19, 2024 14:40
@wildum wildum force-pushed the ptodev/dynamic-pipelines branch 2 times, most recently from b041704 to 421204a Compare January 9, 2025 09:23
Copy link
Contributor

github-actions bot commented Jan 23, 2025

@ptodev ptodev changed the title Proposal for dynamic pipelines Dynamic pipelines - a new foreach block Jan 27, 2025
ptodev and others added 6 commits January 27, 2025 17:10
* add stability lvl to config blocks

* fix import git test
* Add tests for types other than integers

* Minor fixes to string_receiver

* Add a foreach test for maps which contain maps
* Add docs for foreach

* Apply suggestions from code review

Co-authored-by: Clayton Cornell <[email protected]>

* Add a shared experimental_feature snippet

* Addressing PR feedback

* Apply suggestions from code review

Co-authored-by: William Dumont <[email protected]>

---------

Co-authored-by: Clayton Cornell <[email protected]>
Co-authored-by: William Dumont <[email protected]>
@ptodev ptodev force-pushed the ptodev/dynamic-pipelines branch from e16e3e3 to 417bd4d Compare January 27, 2025 17:17
@ptodev ptodev marked this pull request as ready for review January 27, 2025 17:18
@ptodev ptodev requested review from clayton-cornell and a team as code owners January 27, 2025 17:18
Copy link
Collaborator

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work, gave it a first review and will go over it again. Added some comments on tests.

@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Jan 28, 2025
docs/sources/reference/config-blocks/foreach.md Outdated Show resolved Hide resolved
docs/sources/reference/config-blocks/foreach.md Outdated Show resolved Hide resolved
docs/sources/reference/config-blocks/foreach.md Outdated Show resolved Hide resolved
docs/sources/reference/config-blocks/foreach.md Outdated Show resolved Hide resolved
docs/sources/reference/config-blocks/foreach.md Outdated Show resolved Hide resolved
docs/sources/reference/config-blocks/foreach.md Outdated Show resolved Hide resolved
docs/sources/reference/config-blocks/foreach.md Outdated Show resolved Hide resolved
docs/sources/reference/config-blocks/foreach.md Outdated Show resolved Hide resolved
}

func (fi *forEachChild) Hash() uint64 {
fnvHash := fnv.New64a()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using two different hash functions? sha256 and fnv, though they dont collide in usage, it feels odd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runner pkg only needs a 64 bits hash but for the objects in the collection we use a 256 bits hash. I would be ok to use 64 bits for the items in the collection. A collision at this level could end up in missing metrics and duplicated metrics in unlucky scenarios but with 1000 items in the collection (which would be a lot for Alloy) the probability is 2.71×10-14. A collision in the runner pkg would be even worse so I'm not sure that the extra 256 hash security is needed but I also don't mind it so much because it should not matter much in terms of performance. @ptodev what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, for items in the collection the hash is 256 bits since there could be lots of things in each item in a collection, and it gives us extra protection against collisions. For the foreach ID we use a 64 bit hash since it's just a string. IMO it's ok to use different hashes, but we do need to document why each situation uses a different one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use the hash of the items in the collection for the foreachID and the foreachID as a key: https://github.com/grafana/alloy/pull/1480/files#diff-9cebda46e5a40368c4b76027ff2bda36114d2737124b5390aa35fa6435d79127R193
The 64 bits hash is used for the runner that wraps around the foreach to run it.
What's in an item in a collection should not matter. Whether the object has 3 or 30 fields, it should not have any influence on the collision I believe. The only difference would be the number of items in the collection and we would need billions of items for a collision to be likely.
I'm ok to keep it as it is and add a comment that we keep the 256 bits hash for extra security and that if one day it becomes a performance bottleneck it would be ok to use a 64 bits hash instead

Copy link
Collaborator

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments but overall fantastic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-attention type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate prometheus.exporter and discovery components
6 participants