Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metadata to push payload #9694
Add metadata to push payload #9694
Changes from 16 commits
b799556
b0810c8
b7ae131
9bec73a
d876c21
9c6fdef
b4ed160
8c6f22e
c854d06
95ba8a2
7db3e54
ea8f3f6
73cf5dc
7b77c69
44b0932
617e864
f43dfab
384d556
4088112
0b0af55
d4e311c
10d404e
fcd1e66
64d3d6c
96c804d
7f25bb8
752ae6b
4b67145
aa58916
4c60a9e
4028758
8c76150
fb505b4
ccaf192
fb42e9c
eaf235c
7438809
f6e6918
f8c33aa
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this a string feels counterintuitive compared to using
map[string]string
, which is what we use via theLabelSet
struct used in our v1 push payloads inloghttp/push
. Point being, I think we should use JSON directly (not json with embedded stringified json fields) whenapplication/json
is theContent-Type
. When snappy-encoding is used,logproto
can use the string variants for consistency with how it treats labels.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I see with making this a map, and keeping as a string the labels for an entry in logproto, is that converting from
loghttp.Entry
tologproto.Entry
is going to be way slower.Right now (both with string labels), we just cast the pointer for the entries array of a stream since they have the same memory layout.
loki/pkg/loghttp/query.go
Line 236 in 44b0932
If we decide to make one a map and another one a string, we won't be able to do that cast anymore (different memory layouts). Hence, we'll need to go entry by entry converting it from
loghttp.Entry
tologproto.Entry
.IMO, we should make both use the same layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this in f43dfab
This is what we need to do instead of the cast:
loki/pkg/loghttp/query.go
Lines 236 to 239 in f43dfab
If we decide to keep this as a string, we can revert the commit. If we decide to make logproto.Entry to use a map for labels, we can build on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to our API docs, we use JSON object for stream labels )https://grafana.com/docs/loki/latest/api/#push-log-entries-to-loki
so, it makes sense to use JSON object for line metadata also… but you are right @salvacorts that there is no need to convert it to the string when we convert
loghttp.Entry
tologproto.Entry
….There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salvacorts You're right, making this anything but a
string
here will result in performance losses because this it would need to parse labels potentially on every entry coming back from subqueries. Since this happens before we filter out unnecessary entries (overlimit
, etc), this can quickly result in higher resource utilization, OOMs, etc.Now, we have a few options:
string
in memory, but turn it into amap
when return thejson
response from the http api. This will maintain performance (no conversion from protos) & correctly return a map via the json api.LabelSet
for this field (stored as amap
). This requires parsing strings -> json for potentially every log line returned from queriers (yikes!). Even when combined with (4), this would require transformingslice -> map
.labels.Labels
(stored as aslice
). When combined with (4) this incurs no transformation cost and the resulting conversion to the json map for the external API is simple & relatively inexpensive.EntryAdapter.labels
astring
for parity withStreamAdapter.labels
. Instead, I think we should use theLabelAdapter
pattern to send the labels in the protos as a slice and combine it with (3) to not need to pay conversion costs.Does this make sense?
P.S. We'll actually need to do this on the
Entry
and notEntryAdapter
types (we only keep the adapter ones around for bench comparisons, but they're unused). Happy to talk through this more when I've got a bit more time, but we did this primarily for serialization optimizations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for 3 and 4. looks like the most optimal way for us. wdyt @salvacorts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Implemented it in 4028758. We now use
labels.Labels
in bothloghttp.Entry
andlogproto.Entry
.I wonder if we should take care of duplicated labels. I think may not (?) provided the push payload in json is an object so it should have duplicated keys (is that somehow enforced?).
Note that proto3 supports maps. In the wire, the map is the same as an array so it needs extra logic to have the map API. I ran a benchmark and looks like maps compared to arrays are ~x3 slower to marshal and ~x2 slower to unmarshal (happy to share/discuss the benchmark). Therefore I agree the way to go is using arrays in protobuf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These types are used in the external API, meaning we'd return stringified labels as a field, which we shouldn't do. We should use json semantics here. Ideally, we should also reflexively be able to encode -> decode, meaning json
unmarshaling
should expect a json object of non-indexed labels as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with f43dfab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this stored a pointer, now it stores a struct. Not sure we want this (would need to be tested independently for performance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from this PR:
#1145
The only reason why I think this is a pointer is so we don't make a copy of the stream when calling NewStream which receives a ptr to
loghttp.Stream
https://github.com/grafana/loki/pull/1145/files#diff-71619e1c80a73b34eade235a55d012a0ddbb3375b8d4ac89c1f4fd672145b915R34
With Sandeep's changes, we no longer use that since now we decode from json to
loghttp.PushRequest
to then cast it tologproto.PushRequest
loki/pkg/util/unmarshal/unmarshal.go
Line 22 in 44b0932
So having that as a ptr or not shouldn't make much difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to implement
UnmarshalJSON
onlogproto.Stream
directly -- is that feasible?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree!Edit: I think that's not possible right-away.
logproto.Stream
is an alias topush.Stream
provided in the packagegithub.jparrowsec.cn/grafana/loki/pkg/push
which is used in this package.UnmarshalJSON uses the
LabelsSet
, sopkg/push
would need to includepkg/loki
which would end up in a cycling dependency.https://github.com/grafana/loki/pull/9694/files#diff-5b050fb13a302741e2f4a781fa54987e96da67696ae36ea41aa971ef431bfeccR84
I think this Unmarshalling method make sense to live here since:
logproto.Stream
from a json. Moreover, this json has a format for entries (json array) that only this package should know/care about.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can bring all the parsing code to a single place when we work on the endpoint for supporting native OTLP ingestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird -- is there a chance
err
can be non-nil butparseError
is nil?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling of the implementation for
jsonparser.ArrayEach
doesn't seems correct to me. The error being passed to the callback is weird as it'll always be nil. Then, after calling the callback, the library checks for the value of the error, which will always be nil since it's passed by value to the callback. There's an issue open for this:buger/jsonparser#255
IIUC,
parseError
can be nil whileerr
can be non-nil as it can return for example a MalformedJsonError.Having said that, if
err
is not nil, we should returnerr
, notparseError
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with 384d556
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to check types for the values here to ensure they're strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with 384d556
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the scariest thing in the PR ) we should use it only if we check that the buffer is not updated, however, a lot of functions in stack trace mean that somebody can do some optimization to reuse the slice and it will break all our implementation. Why do we need it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in Slack, removed.