-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New merge
transform
#1488
Comments
I'm not sure about this part. Would we roll that functionality into the new transform? There are likely quite a few situations where it would be a better fit than the new strategy of the source detecting partial messages. Or we could keep it as a way to have sources identify which messages are partial? More generally, while I like the idea of implementing this in a way that's not tied to a specific source, I am slightly skeptical of requiring the user to configure a separate transform with this. Could we not just expose it through options on the sources like |
Before I start with the actual implementation, I'd like to prepare some test cases. Is there any place where I could put this setup? Not sure it should belong to the main |
This could fit as a correctness test in our test harness /cc @binarylogic |
Great. I created a repo in the meantime: https://github.com/MOZGIII/vector-merge-test-setup |
Yeah, that would be ideal. There's a little more involved there since the test harness involves terraform and ansible scripts. I'd say keep moving forward with you repo and then we can pair on transitioning it to our test hardness when you're ready. |
Sounds good! I've used Ansible and Terraform before, but still, I'd appreciate guidance! Before we can move it to test harness, I need to figure out a way to programmatically assert the behavior. With my current setup that'd be problematic, however, I imagine it can be done better if we can spare a full VM for it. |
Yep. For example, here's a very simple nested JSON test: You can read about the test here: https://github.com/timberio/vector-test-harness/tree/master/cases/wrapped_json_correctness I don't want to distract you with this stuff right now, but I think it's good for you to see how we're using Ansible to run test cases. When you're ready we can pair and get it set up. |
I'm also not sure about the deprecation since I don't fully understand yet if we're covering all the use cases of the
The main beauty of having this a separate transform is it won't be limited to just merging the raw input from the sources. For instance in the JSON-in-JSON case, one would be able to unwrap the "JSON envelope" and merge over a field from the wrapped document. I think that'd be cool. |
If anyone happen to have a VM dump with |
Also, regarding |
I also realized that we should probably take a look at k8s source as well. It's implemented separately, but the design goal there, as noted in the comments, is for it to be interchangeable with |
Sure, but the
Make sense to me. |
Hmm, that's interesting. To my knowledge, there's no spec that rules container runtime to add |
I just realized the docker approach to partial lines (partial message marker by the presence or absence of the {"log":"{ \"long_value\": \"long value ... \\n", ...}
{"log":" ... \" }", ...} It might so happen that a long log message is split such that the end of the first chunk is |
Actually, we can special casing for |
So, the summary of the progress on this issue:
|
I'm opening this issue to explicitly represent the work of a
merge
transform (I'm open to better names). The purpose of this transform is to merge log lines that were split upstream. For example, thedocker
source caps logs at 16kb and therefore splits logs that exceed this size. We need a transform that allows users to merge these lines together to form the single log line they are meant to represent.Use cases
docker
log events that were split due to sizejournald
log events that were split due to sizefile
log events based on a start and stop pattern (the currentmessage_start_indicator
option)Spec
This spec is largely based on the comments from @MOZGIII. See #1436 (comment), #1436 (comment), and #1436 (comment)).
docker
source should come pre-configured with this transform since we know Docker will split lines.journald
source should come pre-configured with this transform since we know Journald splits lines.message_start_indicator
should be removed from thefile
source in favor of using this new transform. We can provide guides that educate the users on this.Let me know if these requirements are not accurate.
The text was updated successfully, but these errors were encountered: