Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add influx push endpoint to mimir #10153

Merged
merged 57 commits into from
Jan 17, 2025
Merged

Add influx push endpoint to mimir #10153

merged 57 commits into from
Jan 17, 2025

Conversation

alexgreenbank
Copy link
Contributor

@alexgreenbank alexgreenbank commented Dec 6, 2024

What this PR does

Add influx endpoint to Mimir.

Based on original code from #1971 but brought up to date for modern Distributor framework.

MVP. Experimantal/hidden feature so no documentation yet.

Checklist

  • Tests updated.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.
  • Documentation.

@CLAassistant
Copy link

CLAassistant commented Dec 6, 2024

CLA assistant check
All committers have signed the CLA.

Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
Signed-off-by: alexgreenbank <[email protected]>
@alexgreenbank alexgreenbank requested a review from colega December 16, 2024 15:17
pkg/api/api.go Outdated
Comment on lines 272 to 273
// TODO(alexg): hidden behind a featureflag or experimental config option?
a.RegisterRoute(InfluxPushEndpoint, distributor.InfluxHandler(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a feature flag for this is needed. We can just state in the docs (about-versioning.md) that the endpoint is experimental.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

Comment on lines 38 to 41
level.Debug(spanLogger).Log(
"msg", "decodeAndConvert complete",
"bytesRead", bytesRead,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinionated style nit: I don't think we need 4 lines for this debug log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push.

"bytesRead", bytesRead,
)
if err != nil {
level.Error(logger).Log("err", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logger have all the required context? Can you also add some context here about what was going on when this happened? I'm scared of finding this log:

ts=2024-12-17 err="unexpected EOF"

Also, nit, .Error() call is not needed, just pass err.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, fixed in next push.

"github.com/grafana/dskit/grpcutil"
"github.com/grafana/dskit/httpgrpc"
"github.com/grafana/dskit/middleware"
io2 "github.com/influxdata/influxdb/v2/kit/io"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
io2 "github.com/influxdata/influxdb/v2/kit/io"
influxio "github.com/influxdata/influxdb/v2/kit/io"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, fixed in next push.

"github.com/grafana/mimir/pkg/util/spanlogger"
)

func parser(ctx context.Context, r *http.Request, maxSize int, _ *util.RequestBuffers, req *mimirpb.PreallocWriteRequest, logger log.Logger) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function lives in pkg/distributor, I would say it's a little bit pretentious to take the name parser for this :D

How about influxRequestParser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, fixed in next push.

Comment on lines 93 to 94
// TODO(alexg): Do we even need httpgrpc here?
// Check for httpgrpc error, default to client error if parsing failed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any httpgrpc errors being returned by parser

Copy link
Contributor Author

@alexgreenbank alexgreenbank Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, should have got rid of that one. Will resolve once I work out best way to wrap existing error in the StatusBadRequest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, httpgprc is used to smuggle both the http status code and the error message out of the supplier() function. I've removed the misleading comment.

(charInt >= 48 && charInt <= 57) || // 0-9
charInt == 95) { // _

*in = (*in)[:charIndex] + "_" + (*in)[charIndex+1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates a new string (hopefully just one, compiler should be clever) for each one of the invalid characters. How about you modify the bytes slice before transforming it to a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Done in latest commit.

pkg/distributor/influxpush/parser.go Outdated Show resolved Hide resolved
func replaceInvalidChars(in *string) {
for charIndex, char := range *in {
charInt := int(char)
if !((charInt >= 97 && charInt <= 122) || // a-z
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use 'a' instead of 97 (is it correct?) you won't need the comment and the code will be less prone to have bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to literal chars instead of ASCII codes.

pkg/distributor/influxpush/parser.go Outdated Show resolved Hide resolved
Comment on lines 121 to 128
key := string(tag.Key)
if key == "__name__" || key == internalLabel {
continue
}
replaceInvalidChars(&key)
lbls = append(lbls, mimirpb.LabelAdapter{
Name: key,
Value: string(tag.Value),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do yoloString here instead of string(), as the push code is crafted to avoid keeping references to the strings from LabelAdapter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Fixed as per suggestion. I take it no other strings are guaranteed not to be referenced and so it can't be used anywhere else?

@alexgreenbank
Copy link
Contributor Author

alexgreenbank commented Dec 19, 2024

OK, comments left to deal with

  • end2end test for happy path
  • TryUnwrap
  • work on byteslice rather than chars
  • yoloString

@alexgreenbank alexgreenbank marked this pull request as ready for review January 10, 2025 23:03
Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@alexgreenbank alexgreenbank requested a review from colega January 16, 2025 09:10
Comment on lines 108 to 112
// TODO(alexg): One thing we have seen in the past is that telegraf clients send a batch of data
// if it is too big they should respond to the 413 below, but if a client doesn't understand this
// it just sends the next batch that is even bigger. In the past this has had to be dealt with by
// adding rate limits to drop the payloads.
level.Warn(logger).Log("msg", "request too large", "err", err, "bytesRead", bytesRead, "maxMsgSize", maxRecvMsgSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a TODO? What is pending to be done here?

Copy link
Contributor Author

@alexgreenbank alexgreenbank Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the TODO. This is more an operational issue, something that I plan on taking a look at during the dev/ops testing and will add it to the epic.

When the telegraf sends a payload that is too big it just tries it again a bit later with even more data, and so the problem never seems to fix itself.

We need to documented method/runbook for dealing with this situation (which is easy to trigger given we can control the maximum payload size we accept) otherwise it never resolves itself. We need to test how the telegraf agent responds to different return codes, or come up with another plan.

For now the code does the right thing and logs the error and rejects the payload.

Removed the TODO and comment for now.

CHANGELOG.md Outdated
@@ -2,6 +2,8 @@

## main / unreleased

* [FEATURE] Distributor: Add experimental Influx handler. #10153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the Grafana Mimir section below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@colega colega changed the title add influx push endpoint to mimir Add influx push endpoint to mimir Jan 17, 2025
@colega colega merged commit 1836ff1 into main Jan 17, 2025
31 checks passed
@colega colega deleted the alexg/influx-push-handler branch January 17, 2025 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants