-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promtail: Add new target for receiving GCP PubSub Push messages #6777
Conversation
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
8c65211
to
95b9b60
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
@thepalbi thanks a ton for adding support for GCP pubsub Just wanted to clarify one thing. Can we still keep single target type for GCP( We can keep everything more or less same, except if wdyt? |
@@ -37,7 +37,7 @@ type Config struct { | |||
PipelineStages stages.PipelineStages `yaml:"pipeline_stages,omitempty"` | |||
JournalConfig *JournalTargetConfig `yaml:"journal,omitempty"` | |||
SyslogConfig *SyslogTargetConfig `yaml:"syslog,omitempty"` | |||
GcplogConfig *GcplogTargetConfig `yaml:"gcplog,omitempty"` | |||
GcplogConfig *GCPLogTargetConfig `yaml:"gcplog,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive by: renamed the main target to use a better case
} | ||
|
||
// nolint:revive | ||
func newGcplogTarget( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this one should also be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just returning the struct. why don't we make it as part of newPullTarget
itself?. Any reason to have it as separate function?
// If the incoming request carries the tenant id, inject it as the reserved label, so it's used by the | ||
// remote write client. | ||
if xScopeOrgID != "" { | ||
labels[lokiClient.ReservedLabelTenantID] = model.LabelValue(xScopeOrgID) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality mimics what was done in d4cdc37. The idea is to make this target multitenant-supporting, so one can inject the tenantID from upstream using this reserved header (which is well-kown in loki world) and let the loki-client forward it when doing the remote write.
@kavirajk completely agree. Just merged both targets under the same package. Refactored (mainly renames) a little bit the Pull one so that both of them fit better. Let me know what you think 😀 |
ff3faf7
to
9f481cd
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉 thanks @thepalbi for addressing the comments.
Left few minor nits and suggestion. We are good to merge after that.
} | ||
|
||
// nolint:revive | ||
func newGcplogTarget( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just returning the struct. why don't we make it as part of newPullTarget
itself?. Any reason to have it as separate function?
@thepalbi also can you also please update the docs accordingly? |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
Yup, wanted to have a final 👍 on the implementation before doing this. Will add the docs so the PR is sounds! |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
@kavirajk added the docs on |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
35446db
to
d461e11
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
Do we have any concerns about this target potentially allowing users to easily hit per stream rate limit issues? It looks to me like it would be very easy to configure many things to all write to one pubsub and have it all pushed to promtail as a single stream. Should we consider enforcing some amount of the attributes being used as labels? |
@cstyan that's the perfectly valid concern. Even now user can take advantage of relabeling those gcp atrributes. And you are right, user may not be aware about the impact before sending too much messages into single stream. IMO, good place to add this note is in gcp-cloud setup doc https://grafana.com/docs/loki/latest/clients/promtail/gcplog-cloud/ I think @thepalbi decided to update the cloud setup doc in the follow up PR. Can you take care of that when creating follow up PR @thepalbi ? |
Co-authored-by: Kaviraj Kanagaraj <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
bf843ef
to
1dfa0dd
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for revising the docs. The docs in this PR look good to me. I put in a suggestion for a typo fix, but I'm approving now so you won't need a re-review from me.
Co-authored-by: Karen Miller <[email protected]>
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
} | ||
|
||
func (h *pushTarget) push(w http.ResponseWriter, r *http.Request) { | ||
entries := h.handler.Chan() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the channel would change often. Why not h.entires
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean saving h.handler.Chan()
as a struct field, and directly accessing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I was out. Yes.
serverConfig, port, err := getServerConfigWithAvailablePort() | ||
require.NoError(t, err, "error generating server config or finding open port") | ||
config := &scrapeconfig.GcplogTargetConfig{ | ||
Server: serverConfig, | ||
Labels: tc.args.Labels, | ||
UseIncomingTimestamp: false, | ||
SubscriptionType: "push", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the server is quite a handful. What do you think about testing the handler function push
with a ResponseRecorder instead?
Thanks for the look @jeschkies , I'll address the comments in a follow up 😀 |
<!-- Thanks for sending a pull request! Before submitting: 1. Read our CONTRIBUTING.md guide 2. Name your PR as `<Feature Area>: Describe your change`. a. Do not end the title with punctuation. It will be added in the changelog. b. Start with an imperative verb. Example: Fix the latency between System A and System B. c. Use sentence case, not title case. d. Use a complete phrase or sentence. The PR title will appear in a changelog, so help other people understand what your change will be. 3. Rebase your PR if it gets out of sync with main --> **What this PR does / why we need it**: This PR includes some pending improvements to newish promtail GCP push target: - Pending low hanging optimization for replacing a regex that gets evaluated on every incoming request from GCP, for a simpler solution. [Related comments](#6777 (comment)) - Some small refactors. See: [this](#6777 (review)) - Add some validations over the payload the target receives. Payload details are [here](https://cloud.google.com/pubsub/docs/push). This PR adds just validation over the log line, the incoming GCP message ID to add some traceability if needed, and the subscription name because it can be used as label. - Add `subscription` GCP payload field as label, which includes the PubSub subscription identifier. - Add `cause` label to error metrics. Since there's multiple failure point this might help debugging / notice strange behaviours. Also, by running the push target in some dev workloads, we noticed it has some labels different to the ones added in the pull target. This PR also refactors the logic to re-use the same GCP log entry parsing logic in both targets. That way the labels exposed to the user in the push target are $Labels_{pull} \cup Labels_{push}$. **Which issue(s) this PR fixes**: Related to: grafana/cloud-onboarding#2067 **Special notes for your reviewer**: This PR is a follow up [this other one](#6777). It works on some comments made after merge, and adds some changes we saw necessary upon running this target in real environments. <!-- Note about CHANGELOG entries, if a change adds: * an important feature * fixes an issue present in a previous release, * causes a change in operation that would be useful for an operator of Loki to know then please add a CHANGELOG entry. For documentation changes, build changes, simple fixes etc please skip this step. We are attempting to curate a changelog of the most relevant and important changes to be easier to ingest by end users of Loki. Note about the upgrade guide, if this changes: * default configuration values * metric names or label names * changes existing log lines such as the metrics.go query output line * configuration parameters * anything to do with any API * any other change that would require special attention or extra steps to upgrade Please document clearly what changed AND what needs to be done in the upgrade guide. --> **Checklist** - [x] Documentation added - [x] Tests updated - [ ] Is this an important fix or new feature? Add an entry in the `CHANGELOG.md`. - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` Co-authored-by: Kaviraj Kanagaraj <[email protected]>
<!-- Thanks for sending a pull request! Before submitting: 1. Read our CONTRIBUTING.md guide 2. Name your PR as `<Feature Area>: Describe your change`. a. Do not end the title with punctuation. It will be added in the changelog. b. Start with an imperative verb. Example: Fix the latency between System A and System B. c. Use sentence case, not title case. d. Use a complete phrase or sentence. The PR title will appear in a changelog, so help other people understand what your change will be. 3. Rebase your PR if it gets out of sync with main --> **What this PR does / why we need it**: This PR includes some pending improvements to newish promtail GCP push target: - Pending low hanging optimization for replacing a regex that gets evaluated on every incoming request from GCP, for a simpler solution. [Related comments](grafana#6777 (comment)) - Some small refactors. See: [this](grafana#6777 (review)) - Add some validations over the payload the target receives. Payload details are [here](https://cloud.google.com/pubsub/docs/push). This PR adds just validation over the log line, the incoming GCP message ID to add some traceability if needed, and the subscription name because it can be used as label. - Add `subscription` GCP payload field as label, which includes the PubSub subscription identifier. - Add `cause` label to error metrics. Since there's multiple failure point this might help debugging / notice strange behaviours. Also, by running the push target in some dev workloads, we noticed it has some labels different to the ones added in the pull target. This PR also refactors the logic to re-use the same GCP log entry parsing logic in both targets. That way the labels exposed to the user in the push target are $Labels_{pull} \cup Labels_{push}$. **Which issue(s) this PR fixes**: Related to: https://github.com/grafana/cloud-onboarding/issues/2067 **Special notes for your reviewer**: This PR is a follow up [this other one](grafana#6777). It works on some comments made after merge, and adds some changes we saw necessary upon running this target in real environments. <!-- Note about CHANGELOG entries, if a change adds: * an important feature * fixes an issue present in a previous release, * causes a change in operation that would be useful for an operator of Loki to know then please add a CHANGELOG entry. For documentation changes, build changes, simple fixes etc please skip this step. We are attempting to curate a changelog of the most relevant and important changes to be easier to ingest by end users of Loki. Note about the upgrade guide, if this changes: * default configuration values * metric names or label names * changes existing log lines such as the metrics.go query output line * configuration parameters * anything to do with any API * any other change that would require special attention or extra steps to upgrade Please document clearly what changed AND what needs to be done in the upgrade guide. --> **Checklist** - [x] Documentation added - [x] Tests updated - [ ] Is this an important fix or new feature? Add an entry in the `CHANGELOG.md`. - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` Co-authored-by: Kaviraj Kanagaraj <[email protected]>
What this PR does / why we need it:
Promtail currently supports scraping GCP logs using a pull model, in which promtail subscribes to a GCP PubSub topic, to where logs are routed. There's another mechanism possible for receiving messages from PubSub topics (for example logs, but could potentially be other kind of messages) which is by using a push subscription.
This PR adds a new Promtail target which is able to receive messages from a GCP PubSub push subscription. It has been mainly thought for logs, but since it's generic it's able to support whatever message the GCP topic sends, since they all use a similar envelope. The target exposes and HTTP server (with just one
POST /gcp/api/v1/push
endpoint), which is then configured as target of this Push subscriptions.Just to give an example, the envelope that GCP uses for this messages has the following shape:
This message has been extracted from a push subscription, listening for logs from a cloud-function.
The target can be configured as follows:
This target exposes some of the received "label" as "internal labels" to be re-labeled by the user using the scrape config (much like the syslog target does), and dropped if not used. These are:
__gcp_message_id
: message id__gcp_attributes_*
: All attributes read from.message.attributes
are added as a temporal label, conveniently adapting the label name since it might contain non-loki-supported characters. For example forlogging.googleapis.com/timestamp
it's converted to__gcp_attributes_logging_googleapis_com_timestamp
.Some things to keep in mind:
Which issue(s) this PR fixes:
n/a
Special notes for your reviewer:
Documentation to be added.
Checklist
CHANGELOG.md
.docs/sources/upgrading/_index.md