Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

message_stop_indicator for file source #1431

Closed
anton-ryzhov opened this issue Dec 24, 2019 · 16 comments · Fixed by #1852
Closed

message_stop_indicator for file source #1431

anton-ryzhov opened this issue Dec 24, 2019 · 16 comments · Fixed by #1852
Assignees
Labels
needs: requirements Needs a a list of requirements before work can be begin platform: docker Anything `docker` platform related source: file Anything `file` source related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@anton-ryzhov
Copy link
Contributor

anton-ryzhov commented Dec 24, 2019

In addition to message_start_indicator, opposite parameter can be very useful.

For example, docker json log driver splits long lines by 16k. And only very last chunk contains \r\n.

{"log":"First line\r\n","stream":"stdout","time":"2019-12-16T17:25:16.44039104Z"}
{"log":"Second...","stream":"stdout","time":"2019-12-16T17:28:28.448678893Z"}
{"log":"... line\r\n","stream":"stdout","time":"2019-12-16T17:28:28.448700857Z"}
{"log":"Third line\r\n","stream":"stdout","time":"2019-12-16T17:28:28.448700857Z"}

By defining message_start_indicator = '\\n"' I'm getting "Second..." merged to first line rather than to its continuation.

So only difference is how to combine unmatched lines — to previous or to next matched line.

@ghost ghost added source: file Anything `file` source related type: enhancement A value-adding code change that enhances its existing functionality. labels Dec 25, 2019
@binarylogic
Copy link
Contributor

Thanks @anton-ryzhov, I was actually just discussing this exact problem on Docker with @MOZGIII. @MOZGIII what do you think about taking this as your next issue?

@binarylogic binarylogic added the platform: docker Anything `docker` platform related label Dec 26, 2019
@MOZGIII
Copy link
Contributor

MOZGIII commented Dec 26, 2019

Right, this is exactly the problem we discussed. I can work on it.

@MOZGIII
Copy link
Contributor

MOZGIII commented Dec 26, 2019

message_start_indicator = '\\n"' is a smart trick, however, the proper fix for this issue should probably be implemented on a different level, to ensure it works with gelf driver too.

@anton-ryzhov
Copy link
Contributor Author

@binarylogic @MOZGIII good to know.

Docker problem is more complex in the end. So I just started with this more universal feature request of complementary message split. It could be useful not only for docker.


Speaking about docker (seems it should be a separate ticket) — I feel this should be handled on the next level, as transform. But for doing this we need to split messages properly at "source" level. Detecting by '\\n"' looks like dirty hack for me.

Another reason to merge docker logs at transform step — docker logs could be read from journald and docker sources. And same logic should be applied. But we can't merge two events at transform step.

Is it possible to have transformation which will have access to current and previous events [from the same source] and will be able to keep/delete/replace them. Kind of reducing function & sliding window.

@binarylogic binarylogic added the needs: requirements Needs a a list of requirements before work can be begin label Dec 26, 2019
@MOZGIII
Copy link
Contributor

MOZGIII commented Dec 26, 2019

So I just started with this more universal feature request of complementary message split.

Sounds good!

I was thinking about solving the particular case with docker where the inner log messages are JSON - in that case, it'd be possible to just use a streaming JSON parser and join across multiple messages until the parser completes or reaches an unexpected token.
It may be possible to solve this issue more generically, so having a detailed feature request in the first place would be very valuable.

@anton-ryzhov
Copy link
Contributor Author

I want to share my approach/workaround. Doesn't work as expected because of this ticket, but maybe it will help you to come up with a better solution.

[sources.docker_logs]
  type = "file"
  include = ["containers/*/*.log"]

  message_start_indicator = '\\n"'
  # Should be:
  # message_stop_indicator = '\\n"'

[transforms.newlines_to_array]
  type = "lua"
  inputs = ["docker_logs"]
  source = """
   event["message"] = "{\\"chunks\\": [" .. string.gsub(event["message"], "\\n", ",") .. "]}"
  """

[transforms.json_parser]
  type = "json_parser"
  inputs = ["newlines_to_array"]

[transforms.merge_chunks]
  type = "lua"
  inputs = ["json_parser"]
  source = """
    event["time"] = event["chunks[0].time"]
    event["stream"] = event["chunks[0].stream"]
    event["log"] = ""

    local i = 0
    while event["chunks[" .. i .. "].time"] do
      event["log"] = event["log"] .. event["chunks[" .. i .. "].log"]
      event["chunks[" .. i .. "].time"] = nil
      event["chunks[" .. i .. "].log"] = nil
      event["chunks[" .. i .. "].stream"] = nil
      i = i + 1
    end
  """

@binarylogic binarylogic added this to the Improve log continuation strategy milestone Jan 7, 2020
@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 5, 2020

Now, when #1504 landed, this should be possible to achieve what you're looking for here via a merge transform.
We'll continue to look into this issue. We might integrate the merge transform into the file source to provide an built-in solution.

@anton-ryzhov
Copy link
Contributor Author

When there is a universal merge transform — maybe sources shouldn't do that in general (but it's 100% reasonable for docker source).

So maybe idea of that issue is outdated now.

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 7, 2020

It's an ongoing process to determine what sources should support built-in merging, and how.

With build-in merges, we looking for the following:

  • better defaults where applicable - docker is an obvious example, same goes for kubernetes; journald is not so trivial though, we can discuss it at Merge partial events at journald source #1719 if you're interested;
  • better performance - in theory, we can get significant performance wins in some cases if we implement merge transform at the source level;
  • better internal implementation - this is related to both points above, but a bit different; in essence, when you have a source that can produce multiple streams (like docker or journald) in the source internals those streams can be accessed individually with less overhead and reliability compared to using the stream_discriminant_fields; so, in those cases, we can provide an easier to understand configuration interface for the end-users - even if merging is not enabled by default for a particular source.

So, I guess this completes the list of items that we consider when we consider implementing merging at the source.
I am under the impression that we definitely can't cover all possible needs by merging at sources, so we need a merge transform.

The current implementation has a design trait that will stash the partial events and won't release them until it can provide a merged one. Configuring this can be error-prone at times and can cause the transform to consume all system memory if used incorrectly.
Another issue is that when the partial event is read from the source and gets stashed at the merge transform, the source is not aware that the event is not processed, and will not redeliver it in case vector restarts. This is the main problem we see with the design, and the reason we're currently only shipping the transform as a beta.

We probably should add it to the documentation, so people are aware.

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 18, 2020

We'll be adding some new parameters to file source, taking inspiration from https://app.scalyr.com/help/parsing-logs#multiline

In particular, in addition to message_start_indicator, we'll add new parameters, based on how Scalyr does it:

  • continueThrough: all consecutive lines matching this pattern are included in the group. The first line (the line that matched the start pattern) does not need to match the continueThrough pattern. This is useful in cases such as a Java stack trace, where some indicator in the line (such as leading whitespace) indicates that it is an extension of the preceeding line.

  • continuePast: all consecutive lines matching this pattern, plus one additional line, are included in the group. This is useful in cases where a log message ends with a continuation marker, such as a backslash, indicating that the following line is part of the same message.

  • haltBefore: all consecutive lines not matching this pattern are included in the group. This is useful where a log line contains a marker indicating that it begins a new message.

  • haltWith: all consecutive lines, up to and including the first line matching this pattern, are included in the group. This is useful where a log line ends with a termination marker, such as a semicolon.

@binarylogic
Copy link
Contributor

That makes sense to me. While we're adding these options, would it be possible to nest them under a new multiline table? I would like to keep the current message_start_indicator option, but simply remove it from the docs. Ex:

[sources.in]
  type = "file"

  [sources.in.multiline]
    start = "..."
    continue_through = "..."
    continue_past = "..."
    halt_before = "..."
    halt_with = "..."

What do you think? Also, notice the name is snake case.

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 19, 2020

I like the nesting! I think it'll map to the internal implementation better than the way we're doing it right now.
Regarding the exact config options set and names - definitely snake case, but having it as Scalyr does is a bit confusing - you have to specify a start pattern + exactly one of the continuation patterns. It'd be easier for everyone if we just have three fields:

  • start
  • continuation
  • mode - an enum of continue_through/continue_past/halt_before/halt_with

Do you like this?

@binarylogic
Copy link
Contributor

I’m a fan. The ‘mode’ option looks good to me.

@anton-ryzhov
Copy link
Contributor Author

But for halt mode it is not a continuation.

Don't just copy their config, you can make it much more intuitive.

continue and halt could be defined by mutual exclusive pattern definitions without explicit mode.

For continueThrough:

start = "^Error"
through = "^     at"

For haltWith:

start = "^Error"
stop = "Summary line"

haltBefore mode sounds for me like start pattern of the next message. If you met start again — just start a new event.

continuePast mode for backslash example could be stop="[^\\]$"

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 19, 2020

Thanks, @anton-ryzhov!

I've started working on the implementation, and I too noticed that we can probably do it even better than with just mode. I'm working with the mode field for now, and I think the better configuration parameters will emerge in the process.

Your point is definitely valid, and it was an overlook at the initial design pass. What you're proposing looks better, and I'll consider modeling the configuration like that - cause it seems to also be a better fit for the implementation needs.

By the way, I've added a couple of real-world test cases, and I'm looking to add more to make our implementation easier to maintain.

For example, for Java stack traces I have this:

let lines = vec![
    "java.lang.Exception",
    "    at com.foo.bar(bar.java:123)",
    "    at com.foo.baz(baz.java:456)",
];
let config = Config {
    start_pattern: Regex::new("^[^\\s]").unwrap(),
    condition_pattern: Regex::new("^[\\s]+at").unwrap(),
    mode: Mode::ContinueThrough,
    // ...
};
let expected = vec![
    concat!(
        "java.lang.Exception\n",
        "    at com.foo.bar(bar.java:123)\n",
        "    at com.foo.baz(baz.java:456)",
    ),
];
run_and_assert(&lines, config, &expected);

I also have a use case test for Ruby stack traces.

If you'd like to, now would be a good time to share the real-world usage examples that are important to you - and we add them as test cases too!

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 23, 2020

Hey @anton-ryzhov, a follow up after we merged the PR! We ended up going with start_pattern, condition_pattern, mode and timeout_ms fields.

I'll continue with a follow-up issue on improving the configuration fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: requirements Needs a a list of requirements before work can be begin platform: docker Anything `docker` platform related source: file Anything `file` source related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants