-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge back docker logs #1436
Comments
This seems like something the docker source should possibly support. As for the transforms I believe you can return None and then move |
Docker source doesn't support checkpointing, so its application is very limited. |
I mean specifically any source that might always get split logs should in theory be able to handle this without needing to add an extra transform. |
@LucioFranco can you share why? We had a good discussion about this in #1431. I'm not sure I agree with you. The fact that you can read docker logs from multiple sources means that we should, in some way, decouple the logic of joining events from sources. |
Sure, decoupling is fine, but I think that adds a level of indirection where I think the docker source should just work out of the box for specific cases that are produced by docker. This seems like a pathological case where it will always do it for a certain size and a user just adding the docker source should just have this work. As for the check pointing in this case you don't have the full log until you send it down, so if the docker daemon gets reset you have a half corrupted log in your buffer which could affect downstream transforms. If you aggregate at the source you can ensure that you don't get corrupted logs. As for the transform anyways, it should be possible to aggregate, though you could hook up multiple sources to this and there is not really a way to differentiate but I think that should be a decently obvious things for users to notice if they are getting tangled logs. |
Thanks.
Agree 👍 . I'm just curious how you'd solve reading Docker logs from another source, like |
Special handling of these cases in sources?
And maybe some shared code behind the scene. |
Nice, that's interesting and not bad idea. We've discussed similar options for decoding as well, avoiding the need for a |
I was thinking about this issue recently, and I think have come up with a very flexible design. Here is my plan.
This would allow us to solve a much bigger set of issues for the users (i.e. it's a generic solution, not just for docker source) and doesn't involve special casing (i.e. streaming JSON parser works only with JSON data). What do you think? I missed a couple of messages here, and I see it was discussed before that docker source should merge messages automatically out of the box. We can achieve this with the design I'm proposing by sort of adding the transform from (2) to all the configurations by default. Personally, I would vote against merging automatically for the reasons described above (at (2)), but I can see why people might find it useful, so I don't mind it as long as it can be turned off cause in some cases I do want to do a lower-level log aggregation, for example, for the docker engine debugging purposes. The desire to implement this as primitives comes from the desire to Alternatives are to implement all the logic described in (1) and (2) within relevant sources. That would, however, be way less flexible and configurable. The implementation might be simpler - mainly for the reason that we won't need to deal with adding the "related messages", but it might also be not too bad if we figure the right abstraction. |
I forgot to mention: flexibility is an important feature of the design. |
Nice writeup! I have a few questions and suggestions before we move forward with this: Comments
Journald provides this as part of the message's metadata? Do you mind providing an example or linking to the docs? I wasn't aware of this 😄 .
This is tangential, but you might be conflating 2 separate issues here. In my opinion, we should solve the message continuation problem without regard to downstream side-effects and add other separate options for these side-effects. For example, the FactorsThere are a few factors worth addressing individually:
Final ThoughtsI'm not entirely sure what the best approach is yet, but I'm leaning towards the proposed transform approach. Especially given #1447, which makes chaining transforms trivial. We can solve factor #1 (UX) by including this in the documentation examples. Before we proceed, I want to get buy-in from other members on the team. |
It isn't Journald feature nor metadata. It's a docker log driver feature — here an example dump
It doesn't solve it for all cases. Specifically docker case can't be handled — #1431 |
Having this as separate transform solves this and similar problems in most generic way. One thing you should take care of — merge partial messages by specific key: ( |
Yeah, sorry, I was under the impression partial message indicator is a common I like the concept of using message fields to indicate that a message is partial. This is simple, “no magic” approach, compared to storing this data as an internal per-message flag. I.e. to mark a docker source message as partial, transform would have to peek into the In regards to user experience - exposing a lot of tweakability might be overwhelming, but it is necessary to cover all cases. Therefore I would propose to also add some kind of configuration presets and reusability. We can hard code some well-known patterns behind some easy-to-turn-on flags - for docker case, we might introduce configuration in the docker source that would automatically add a merging transform preconfigured for docker source, or something like that. |
This one replaces the #1431, so, @lukesteensen would you mind if I take this? |
Sure, @MOZGIII, that approach sounds reasonable! It's possible we could generalize the existing line aggregator that we've built into the file source to achieve this. |
While conducting the research on docker, I stumbled upon this. This is the approach utilized by the We could implement both, or choose just one approach and go with it. |
I didn't get how referred function is related to the discussion. They If you'll apply this to docker (outer) json — it'll split exactly it does now. If you'll unwrap outer json — you can handle only json logs with this approach. |
I don't think that function has to do anything with |
Btw, I created a repo to replicate the issue: https://github.com/MOZGIII/vector-merge-test-setup |
I just realized that the approach that the |
Closed via #1504
|
Looks nice and usable. |
@anton-ryzhov they should have been fixed via #1661 but maybe there haven't been? I'll reopen that issue. |
Docker log drivers split messages by 16k (moby/moby#32923)
Json log driver splits logs into multiple messages but only last one has trailing newline symbols (
\r\n
).In Journald all parts but last contains
CONTAINER_PARTIAL_MESSAGE
property.These features can be used to merge log events back. But currently it's not possible with vector because transforms can't access to two events in row to merge them.
See also #1431 discussion
The text was updated successfully, but these errors were encountered: