-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add influx push endpoint to mimir #10153
Conversation
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
pkg/api/api.go
Outdated
// TODO(alexg): hidden behind a featureflag or experimental config option? | ||
a.RegisterRoute(InfluxPushEndpoint, distributor.InfluxHandler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a feature flag for this is needed. We can just state in the docs (about-versioning.md) that the endpoint is experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done.
pkg/distributor/influx.go
Outdated
level.Debug(spanLogger).Log( | ||
"msg", "decodeAndConvert complete", | ||
"bytesRead", bytesRead, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opinionated style nit: I don't think we need 4 lines for this debug log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push.
pkg/distributor/influx.go
Outdated
"bytesRead", bytesRead, | ||
) | ||
if err != nil { | ||
level.Error(logger).Log("err", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this logger have all the required context? Can you also add some context here about what was going on when this happened? I'm scared of finding this log:
ts=2024-12-17 err="unexpected EOF"
Also, nit, .Error()
call is not needed, just pass err
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, fixed in next push.
pkg/distributor/influx.go
Outdated
"github.com/grafana/dskit/grpcutil" | ||
"github.com/grafana/dskit/httpgrpc" | ||
"github.com/grafana/dskit/middleware" | ||
io2 "github.com/influxdata/influxdb/v2/kit/io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io2 "github.com/influxdata/influxdb/v2/kit/io" | |
influxio "github.com/influxdata/influxdb/v2/kit/io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, fixed in next push.
pkg/distributor/influx.go
Outdated
"github.com/grafana/mimir/pkg/util/spanlogger" | ||
) | ||
|
||
func parser(ctx context.Context, r *http.Request, maxSize int, _ *util.RequestBuffers, req *mimirpb.PreallocWriteRequest, logger log.Logger) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function lives in pkg/distributor
, I would say it's a little bit pretentious to take the name parser
for this :D
How about influxRequestParser
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, fixed in next push.
pkg/distributor/influx.go
Outdated
// TODO(alexg): Do we even need httpgrpc here? | ||
// Check for httpgrpc error, default to client error if parsing failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any httpgrpc errors being returned by parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, should have got rid of that one. Will resolve once I work out best way to wrap existing error in the StatusBadRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, httpgprc
is used to smuggle both the http status code and the error message out of the supplier()
function. I've removed the misleading comment.
pkg/distributor/influxpush/parser.go
Outdated
(charInt >= 48 && charInt <= 57) || // 0-9 | ||
charInt == 95) { // _ | ||
|
||
*in = (*in)[:charIndex] + "_" + (*in)[charIndex+1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates a new string (hopefully just one, compiler should be clever) for each one of the invalid characters. How about you modify the bytes slice before transforming it to a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Done in latest commit.
pkg/distributor/influxpush/parser.go
Outdated
func replaceInvalidChars(in *string) { | ||
for charIndex, char := range *in { | ||
charInt := int(char) | ||
if !((charInt >= 97 && charInt <= 122) || // a-z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use 'a' instead of 97 (is it correct?) you won't need the comment and the code will be less prone to have bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to literal chars instead of ASCII codes.
pkg/distributor/influxpush/parser.go
Outdated
key := string(tag.Key) | ||
if key == "__name__" || key == internalLabel { | ||
continue | ||
} | ||
replaceInvalidChars(&key) | ||
lbls = append(lbls, mimirpb.LabelAdapter{ | ||
Name: key, | ||
Value: string(tag.Value), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do yoloString
here instead of string()
, as the push code is crafted to avoid keeping references to the strings from LabelAdapter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Fixed as per suggestion. I take it no other strings are guaranteed not to be referenced and so it can't be used anywhere else?
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
OK, comments left to deal with
|
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
…into alexg/influx-push-handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Co-authored-by: Oleg Zaytsev <[email protected]>
Co-authored-by: Oleg Zaytsev <[email protected]>
Co-authored-by: Oleg Zaytsev <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
pkg/distributor/influx.go
Outdated
// TODO(alexg): One thing we have seen in the past is that telegraf clients send a batch of data | ||
// if it is too big they should respond to the 413 below, but if a client doesn't understand this | ||
// it just sends the next batch that is even bigger. In the past this has had to be dealt with by | ||
// adding rate limits to drop the payloads. | ||
level.Warn(logger).Log("msg", "request too large", "err", err, "bytesRead", bytesRead, "maxMsgSize", maxRecvMsgSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a TODO? What is pending to be done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the TODO. This is more an operational issue, something that I plan on taking a look at during the dev/ops testing and will add it to the epic.
When the telegraf
sends a payload that is too big it just tries it again a bit later with even more data, and so the problem never seems to fix itself.
We need to documented method/runbook for dealing with this situation (which is easy to trigger given we can control the maximum payload size we accept) otherwise it never resolves itself. We need to test how the telegraf
agent responds to different return codes, or come up with another plan.
For now the code does the right thing and logs the error and rejects the payload.
Removed the TODO and comment for now.
CHANGELOG.md
Outdated
@@ -2,6 +2,8 @@ | |||
|
|||
## main / unreleased | |||
|
|||
* [FEATURE] Distributor: Add experimental Influx handler. #10153 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to the Grafana Mimir section below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: alexgreenbank <[email protected]>
What this PR does
Add influx endpoint to Mimir.
Based on original code from #1971 but brought up to date for modern Distributor framework.
MVP. Experimantal/hidden feature so no documentation yet.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.