Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline Parser: Built-in CRI parser does not work in multiline filter #4034

Closed
pranavmarla opened this issue Aug 31, 2021 · 9 comments · Fixed by #4375
Closed

Multiline Parser: Built-in CRI parser does not work in multiline filter #4034

pranavmarla opened this issue Aug 31, 2021 · 9 comments · Fixed by #4375
Assignees

Comments

@pranavmarla
Copy link

pranavmarla commented Aug 31, 2021

Bug Report

Describe the bug
The built-in CRI multiline parser only works when it is part of the tail input plugin. If we add it later, as part of a multiline filter, it doesn't work even though I believe it should in theory have the same effect.

This may be related to #4033.

To Reproduce
Steps to reproduce the problem:

  • Create the following test input log file, with 3 logs (log 2 is multi-line):
2021-08-30T16:01:00.123456789Z stdout F Single-line log 1
2021-08-30T16:02:00.123456789Z stdout P Multi-line log 2: Start 
2021-08-30T16:02:00.123456789Z stdout P Multi-line log 2: Middle 
2021-08-30T16:02:00.123456789Z stdout F Multi-line log 2: End
2021-08-30T16:03:00.123456789Z stdout F Single-line log 3
  • Create Fluent Bit config 1 (CRI multiline parsing happens at input):
[INPUT]
  Name              tail
  Path              test.log
  path_key          log_file
  read_from_head    true
  multiline.parser  cri

[OUTPUT]
  name              stdout
  match             *
  • Create Fluent Bit config 2 (CRI multiline parsing happens after input, in filter):
[INPUT]
  Name              tail
  Path              test.log
  path_key          log_file
  read_from_head    true

[FILTER]
  name                  multiline
  match                 *
  multiline.parser      cri

[OUTPUT]
  name              stdout
  match             *

Expected behavior
Since I am using the same built-in CRI multiline parser in both Fluent Bit configs, I expect the same results. Instead, the lines are correctly parsed only for config 1 -- config 2 does not work.

Specifically, this is the output we get:

  • Config 1: All 3 log records are correctly processed
[0] tail.0: [1630339260.123456789, {"time"=>"2021-08-30T16:01:00.123456789Z", "stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 1", "log_file"=>"test.log"}]

[1] tail.0: [1630339320.123456789, {"time"=>"2021-08-30T16:02:00.123456789Z", "stream"=>"stdout", "_p"=>"P", "log"=>"Multi-line log 2: Start Multi-line log 2: Middle Multi-line log 2: End", "log_file"=>"test.log"}]

[2] tail.0: [1630339380.123456789, {"time"=>"2021-08-30T16:03:00.123456789Z", "stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 3", "log_file"=>"test.log"}]
  • Config 2: All 3 log records are wrongly concatenated together, as if they were all part of the same multi-line log
[0] tail.0: [1630429988.270397573, {"log_file"=>"test.log", "log"=>"2021-08-30T16:01:00.123456789Z stdout F Single-line log 12021-08-30T16:02:00.123456789Z stdout P Multi-line log 2: Start 2021-08-30T16:02:00.123456789Z stdout P Multi-line log 2: Middle 2021-08-30T16:02:00.123456789Z stdout F Multi-line log 2: End2021-08-30T16:03:00.123456789Z stdout F Single-line log 3"}]

Your Environment

  • Version used: 1.8.4

Additional context
The multiline filter doc does recommend that you use config 1, but it doesn't say that you have to use it -- i.e. it doesn't seem to be expected behaviour that you cannot use config 2:

If you aim to concatenate messages split originally by Docker or CRI container engines, we recommend doing the concatenation on Tail plugin, this same functionality exists there.

The reason we might want to use config 2 instead is, for example, if we want to do some custom manipulation of the log BEFORE it is parsed.

@pranavmarla pranavmarla changed the title Multiline Parser: CRI parser does not work in multiline filter Multiline Parser: Built-in CRI parser does not work in multiline filter Aug 31, 2021
@pranavmarla
Copy link
Author

FYI @edsiper

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added Stale and removed Stale labels Oct 2, 2021
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Nov 21, 2021
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@edsiper
Copy link
Member

edsiper commented Nov 27, 2021

@nokute78 can you take a look at this one pls ?

@nokute78
Copy link
Collaborator

@edsiper The output of Config 2 is a concatenated non parsed line.

ml_append_try_parser calls flb_parser_do only if a type is FLB_ML_TYPE_TEXT.
The type will be FLB_ML_TYPE_MAP in case of Config 2.
So incoming record may not be parsed.

@nokute78
Copy link
Collaborator

I added a filter_parser plugin to parse cri log before multiline parser.
It worked.
So, I think it is need to add flb_parser_do in case ofFLB_ML_TYPE_MAP.

[SERVICE]
  Log_Level info
  Parsers_File parsers.conf

[INPUT]
  Name              tail
  Path              test.log
  path_key          log_file
  read_from_head    true

## Add filter_parser
[FILTER]
  name              parser
  match             *
  Key_Name          log
  Parser            cri

[FILTER]
  name                  multiline
  match                 *
  multiline.parser      cri

[OUTPUT]
  name              stdout
  match             *
[PARSER]
    Name cri
    Format regex
    Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<_p>[^ ]*) (?<log>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

Output:

$ ../bin/fluent-bit -c aa.conf 
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/11/27 17:26:09] [ info] [engine] started (pid=35862)
[2021/11/27 17:26:09] [ info] [storage] version=1.1.5, initializing...
[2021/11/27 17:26:09] [ info] [storage] in-memory
[2021/11/27 17:26:09] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/11/27 17:26:09] [ info] [cmetrics] version=0.2.2
[2021/11/27 17:26:09] [ info] [sp] stream processor started
[2021/11/27 17:26:09] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1453614 watch_fd=1 name=test.log
[0] tail.0: [1630339260.123456789, {"stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 1"}]
[1] tail.0: [1630339320.123456789, {"stream"=>"stdout", "_p"=>"P", "log"=>"Multi-line log 2: Start Multi-line log 2: Middle Multi-line log 2: End"}]
[2] tail.0: [1630339380.123456789, {"stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 3"}]
^C[2021/11/27 17:26:14] [engine] caught signal (SIGINT)
[2021/11/27 17:26:14] [ info] [input] pausing tail.0
[2021/11/27 17:26:14] [ warn] [engine] service will shutdown in max 5 seconds
[2021/11/27 17:26:15] [ info] [engine] service has stopped (0 pending tasks)

nokute78 added a commit to nokute78/fluent-bit that referenced this issue Nov 27, 2021
nokute78 added a commit to nokute78/fluent-bit that referenced this issue Nov 27, 2021
@nokute78
Copy link
Collaborator

I sent a patch #4375.

It causes side effect.
New record only contains parsed field.
All other original fields will be removed like Reserve_Data false of filter_parser.

@github-actions github-actions bot removed the Stale label Nov 28, 2021
nokute78 added a commit to nokute78/fluent-bit that referenced this issue Dec 5, 2021
nokute78 added a commit to nokute78/fluent-bit that referenced this issue Dec 5, 2021
It is the test case to send cri log as FLB_ML_TYPE_MAP.

Signed-off-by: Takahiro Yamashita <[email protected]>
nokute78 added a commit to nokute78/fluent-bit that referenced this issue Dec 5, 2021
It is the test case to send cri log as FLB_ML_TYPE_MAP.

Signed-off-by: Takahiro Yamashita <[email protected]>
edsiper pushed a commit that referenced this issue Dec 12, 2021
edsiper pushed a commit that referenced this issue Dec 12, 2021
It is the test case to send cri log as FLB_ML_TYPE_MAP.

Signed-off-by: Takahiro Yamashita <[email protected]>
0Delta pushed a commit to 0Delta/fluent-bit that referenced this issue Jan 20, 2022
0Delta pushed a commit to 0Delta/fluent-bit that referenced this issue Jan 20, 2022
It is the test case to send cri log as FLB_ML_TYPE_MAP.

Signed-off-by: Takahiro Yamashita <[email protected]>
@missourian55
Copy link

@nokute78
I tested this

I added a filter_parser plugin to parse cri log before multiline parser. It worked. So, I think it is need to add flb_parser_do in case ofFLB_ML_TYPE_MAP.

[SERVICE]
  Log_Level info
  Parsers_File parsers.conf

[INPUT]
  Name              tail
  Path              test.log
  path_key          log_file
  read_from_head    true

## Add filter_parser
[FILTER]
  name              parser
  match             *
  Key_Name          log
  Parser            cri

[FILTER]
  name                  multiline
  match                 *
  multiline.parser      cri

[OUTPUT]
  name              stdout
  match             *
[PARSER]
    Name cri
    Format regex
    Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<_p>[^ ]*) (?<log>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

Output:

$ ../bin/fluent-bit -c aa.conf 
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/11/27 17:26:09] [ info] [engine] started (pid=35862)
[2021/11/27 17:26:09] [ info] [storage] version=1.1.5, initializing...
[2021/11/27 17:26:09] [ info] [storage] in-memory
[2021/11/27 17:26:09] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/11/27 17:26:09] [ info] [cmetrics] version=0.2.2
[2021/11/27 17:26:09] [ info] [sp] stream processor started
[2021/11/27 17:26:09] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1453614 watch_fd=1 name=test.log
[0] tail.0: [1630339260.123456789, {"stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 1"}]
[1] tail.0: [1630339320.123456789, {"stream"=>"stdout", "_p"=>"P", "log"=>"Multi-line log 2: Start Multi-line log 2: Middle Multi-line log 2: End"}]
[2] tail.0: [1630339380.123456789, {"stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 3"}]
^C[2021/11/27 17:26:14] [engine] caught signal (SIGINT)
[2021/11/27 17:26:14] [ info] [input] pausing tail.0
[2021/11/27 17:26:14] [ warn] [engine] service will shutdown in max 5 seconds
[2021/11/27 17:26:15] [ info] [engine] service has stopped (0 pending tasks)

I tested it and not seeing the expected behavior

Fluent Bit v2.0.11
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/04/30 19:10:28] [ info] [fluent bit] version=2.0.11, commit=, pid=86153
[2023/04/30 19:10:28] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/04/30 19:10:28] [ info] [cmetrics] version=0.6.0
[2023/04/30 19:10:28] [ info] [ctraces ] version=0.3.0
[2023/04/30 19:10:28] [ info] [input:tail:tail.0] initializing
[2023/04/30 19:10:28] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2023/04/30 19:10:28] [ info] [filter:multiline:multiline.1] created emitter: emitter_for_multiline.1
[2023/04/30 19:10:28] [ info] [input:emitter:emitter_for_multiline.1] initializing
[2023/04/30 19:10:28] [ info] [input:emitter:emitter_for_multiline.1] storage_strategy='memory' (memory only)
[2023/04/30 19:10:28] [ info] [output:stdout:stdout.0] worker #0 started
[2023/04/30 19:10:28] [ info] [sp] stream processor started
[2023/04/30 19:10:28] [ info] [filter:multiline:multiline.1] created new multiline stream for tail.0_tail.0
[0] tail.0: [1630339260.123456789, {"stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 1"}]
[1] tail.0: [1630339320.123456789, {"stream"=>"stdout", "_p"=>"P", "log"=>"Multi-line log 2: Start "}]
[2] tail.0: [1630339320.123456789, {"stream"=>"stdout", "_p"=>"P", "log"=>"Multi-line log 2: Middle "}]
[3] tail.0: [1630339320.123456789, {"stream"=>"stdout", "_p"=>"F", "log"=>"Multi-line log 2: End"}]
[4] tail.0: [1630339380.123456789, {"stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 3"}]
^C[2023/04/30 19:12:15] [engine] caught signal (SIGINT)
[2023/04/30 19:12:15] [ info] [input] pausing tail.0
[2023/04/30 19:12:15] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2023/04/30 19:12:15] [ info] [output:stdout:stdout.0] thread worker #0 stopped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants