Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ml: support FLB_ML_TYPE_MAP to parse(#4034) #4375

Merged
merged 2 commits into from
Dec 12, 2021
Merged

Conversation

nokute78
Copy link
Collaborator

@nokute78 nokute78 commented Nov 27, 2021

Fixes #4034

ml_append_try_parser calls flb_parser_do only if a type is FLB_ML_TYPE_TEXT.
The type will be FLB_ML_TYPE_MAP in case of Config 2. #4034 (comment)
So incoming record may not be parsed.

This patch is to support FLB_ML_TYPE_MAP.

Diff

  • Move L523-L544 to new function ml_append_try_parser_type_text
  • Add new function ml_append_try_parser_type_map for FLB_ML_TYPE_MAP. It looks up a value of key_content and passes ml_append_try_parser_type_text
  • Append condition. When full_map and buf are not NULL, it indicates ml_append_try_parser_type_map is called and we should use buf which is processed from full_map.

Known_issue

The record will be overwritten by parsed record. So some of original values will be lost like Reserve_Data false of filter_parser.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • [N/A] Documentation required for this feature

Configuration file

#4034 (comment)

[INPUT]
  Name              tail
  Path              test.log
  path_key          log_file
  read_from_head    true

[FILTER]
  name                  multiline
  match                 *
  multiline.parser      cri

[OUTPUT]
  name              stdout
  match             *

test.log:

2021-08-30T16:01:00.123456789Z stdout F Single-line log 1
2021-08-30T16:02:00.123456789Z stdout P Multi-line log 2: Start 
2021-08-30T16:02:00.123456789Z stdout P Multi-line log 2: Middle 
2021-08-30T16:02:00.123456789Z stdout F Multi-line log 2: End
2021-08-30T16:03:00.123456789Z stdout F Single-line log 3

Debug output

4$ ../bin/fluent-bit -c a.conf 
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/11/27 21:58:24] [ info] [engine] started (pid=85827)
[2021/11/27 21:58:24] [ info] [storage] version=1.1.5, initializing...
[2021/11/27 21:58:24] [ info] [storage] in-memory
[2021/11/27 21:58:24] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/11/27 21:58:24] [ info] [cmetrics] version=0.2.2
[2021/11/27 21:58:24] [ info] [sp] stream processor started
[2021/11/27 21:58:24] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1453614 watch_fd=1 name=test.log
[0] tail.0: [1630339260.123456789, {"time"=>"2021-08-30T16:01:00.123456789Z", "stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 1"}]
[1] tail.0: [1630339320.123456789, {"time"=>"2021-08-30T16:02:00.123456789Z", "stream"=>"stdout", "_p"=>"P", "log"=>"Multi-line log 2: Start Multi-line log 2: Middle Multi-line log 2: End"}]
[2] tail.0: [1630339380.123456789, {"time"=>"2021-08-30T16:03:00.123456789Z", "stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 3"}]
^C[2021/11/27 21:58:30] [engine] caught signal (SIGINT)
[2021/11/27 21:58:30] [ info] [input] pausing tail.0
[2021/11/27 21:58:30] [ warn] [engine] service will shutdown in max 5 seconds
[2021/11/27 21:58:31] [ info] [engine] service has stopped (0 pending tasks)
[2021/11/27 21:58:31] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=1453614 watch_fd=1

Valgrind output

$ valgrind --leak-check=full ../bin/fluent-bit -c a.conf 
==85830== Memcheck, a memory error detector
==85830== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==85830== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==85830== Command: ../bin/fluent-bit -c a.conf
==85830== 
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/11/27 21:59:06] [ info] [engine] started (pid=85830)
[2021/11/27 21:59:06] [ info] [storage] version=1.1.5, initializing...
[2021/11/27 21:59:06] [ info] [storage] in-memory
[2021/11/27 21:59:06] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/11/27 21:59:06] [ info] [cmetrics] version=0.2.2
[2021/11/27 21:59:06] [ info] [sp] stream processor started
[2021/11/27 21:59:06] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1453614 watch_fd=1 name=test.log
==85830== Warning: client switching stacks?  SP change: 0x57e59f8 --> 0x4cd2820
==85830==          to suppress, use: --max-stackframe=11612632 or greater
==85830== Warning: client switching stacks?  SP change: 0x4cd27c8 --> 0x57e59f8
==85830==          to suppress, use: --max-stackframe=11612720 or greater
==85830== Warning: client switching stacks?  SP change: 0x57e59f8 --> 0x4cd27c8
==85830==          to suppress, use: --max-stackframe=11612720 or greater
==85830==          further instances of this message will not be shown.
[0] tail.0: [1630339260.123456789, {"time"=>"2021-08-30T16:01:00.123456789Z", "stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 1"}]
[1] tail.0: [1630339320.123456789, {"time"=>"2021-08-30T16:02:00.123456789Z", "stream"=>"stdout", "_p"=>"P", "log"=>"Multi-line log 2: Start Multi-line log 2: Middle Multi-line log 2: End"}]
[2] tail.0: [1630339380.123456789, {"time"=>"2021-08-30T16:03:00.123456789Z", "stream"=>"stdout", "_p"=>"F", "log"=>"Single-line log 3"}]
^C[2021/11/27 21:59:12] [engine] caught signal (SIGINT)
[2021/11/27 21:59:12] [ info] [input] pausing tail.0
[2021/11/27 21:59:12] [ warn] [engine] service will shutdown in max 5 seconds
[2021/11/27 21:59:12] [ info] [engine] service has stopped (0 pending tasks)
[2021/11/27 21:59:12] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=1453614 watch_fd=1
==85830== 
==85830== HEAP SUMMARY:
==85830==     in use at exit: 0 bytes in 0 blocks
==85830==   total heap usage: 1,361 allocs, 1,361 frees, 934,074 bytes allocated
==85830== 
==85830== All heap blocks were freed -- no leaks are possible
==85830== 
==85830== For lists of detected and suppressed errors, rerun with: -s
==85830== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@nokute78
Copy link
Collaborator Author

Internal test failed.

All failed cases of type is FLB_ML_REGEX. I will check the type.

  • test_parser_java
  • test_parser_elastic
  • test_issue_3817_1

@nokute78
Copy link
Collaborator Author

nokute78 commented Dec 5, 2021

I updated patch and CI passed.

@nokute78
Copy link
Collaborator Author

nokute78 commented Dec 5, 2021

I added internal test case.

Output of v1.8.10 (not fixed version)

Test issue_4034...                              [0] [1638689631.573902862, {"log"=>"2019-05-07T18:57:50.904275087+00:00 stdout P 1a. some "}]
[0] [1638689631.574049482, {"log"=>"2019-05-07T18:57:51.904275088+00:00 stdout P multiline "}]
[0] [1638689631.574198766, {"log"=>"2019-05-07T18:57:52.904275089+00:00 stdout F log"}]
[0] [1638689631.574284214, {"log"=>"2019-05-07T18:57:50.904275087+00:00 stderr P 1b. some "}]
[0] [1638689631.574364852, {"log"=>"2019-05-07T18:57:51.904275088+00:00 stderr P multiline "}]
[0] [1638689631.574445159, {"log"=>"2019-05-07T18:57:52.904275089+00:00 stderr F log"}]
[0] [1638689631.574524355, {"log"=>"2019-05-07T18:57:53.904275090+00:00 stdout P 2a. another "}]
[0] [1638689631.574603651, {"log"=>"2019-05-07T18:57:54.904275091+00:00 stdout P multiline "}]
[0] [1638689631.574682165, {"log"=>"2019-05-07T18:57:55.904275092+00:00 stdout F log"}]
[0] [1638689631.574760810, {"log"=>"2019-05-07T18:57:53.904275090+00:00 stderr P 2b. another "}]
[0] [1638689631.574839895, {"log"=>"2019-05-07T18:57:54.904275091+00:00 stderr P multiline "}]
[0] [1638689631.574918720, {"log"=>"2019-05-07T18:57:55.904275092+00:00 stderr F log"}]
[0] [1638689631.574996814, {"log"=>"2019-05-07T18:57:56.904275093+00:00 stdout F 3a. non multiline 1"}]
[0] [1638689631.575114150, {"log"=>"2019-05-07T18:57:57.904275094+00:00 stdout F 4a. non multiline 2"}]
[0] [1638689631.575197413, {"log"=>"2019-05-07T18:57:56.904275093+00:00 stderr F 3b. non multiline 1"}]
[0] [1638689631.575276027, {"log"=>"2019-05-07T18:57:57.904275094+00:00 stderr F 4b. non multiline 2"}]

----- MULTILINE FLUSH -----
[0] [1638689631.573902862, {"log"=>"2019-05-07T18:57:50.904275087+00:00 stdout P 1a. some 2019-05-07T18:57:51.904275088+00:00 stdout P multiline 2019-05-07T18:57:52.904275089+00:00 stdout F log2019-05-07T18:57:50.904275087+00:00 stderr P 1b. some 2019-05-07T18:57:51.904275088+00:00 stderr P multiline 2019-05-07T18:57:52.904275089+00:00 stderr F log2019-05-07T18:57:53.904275090+00:00 stdout P 2a. another 2019-05-07T18:57:54.904275091+00:00 stdout P multiline 2019-05-07T18:57:55.904275092+00:00 stdout F log2019-05-07T18:57:53.904275090+00:00 stderr P 2b. another 2019-05-07T18:57:54.904275091+00:00 stderr P multiline 2019-05-07T18:57:55.904275092+00:00 stderr F log2019-05-07T18:57:56.904275093+00:00 stdout F 3a. non multiline 12019-05-07T18:57:57.904275094+00:00 stdout F 4a. non multiline 22019-05-07T18:57:56.904275093+00:00 stderr F 3b. non multiline 12019-05-07T18:57:57.904275094+00:00 stderr F 4b. non multiline 2"}]
----------- EOF -----------
[ FAILED ]
  multiline.c:402: Check val.via.str.size == len... failed
expected length: 22, received: 890
== received ==
"2019-05-07T18:57:50.904275087+00:00 stdout P 1a. some 2019-05-07T18:57:51.904275088+00:00 stdout P multiline 2019-05-07T18:57:52.904275089+00:00 stdout F log2019-05-07T18:57:50.904275087+00:00 stderr P 1b. some 2019-05-07T18:57:51.904275088+00:00 stderr P multiline 2019-05-07T18:57:52.904275089+00:00 stderr F log2019-05-07T18:57:53.904275090+00:00 stdout P 2a. another 2019-05-07T18:57:54.904275091+00:00 stdout P multiline 2019-05-07T18:57:55.904275092+00:00 stdout F log2019-05-07T18:57:53.904275090+00:00 stderr P 2b. another 2019-05-07T18:57:54.904275091+00:00 stderr P multiline 2019-05-07T18:57:55.904275092+00:00 stderr F log2019-05-07T18:57:56.904275093+00:00 stdout F 3a. non multiline 12019-05-07T18:57:57.904275094+00:00 stdout F 4a. non multiline 22019-05-07T18:57:56.904275093+00:00 stderr F 3b. non multiline 12019-05-07T18:57:57.904275094+00:00 stderr F 4b. non multiline 2"

== expected ==
1a. some multiline log

Output of the PR

Test issue_4034...                              [0] [1638689838.834384498, {"log"=>"2019-05-07T18:57:50.904275087+00:00 stdout P 1a. some "}]
[0] [1638689838.836779627, {"log"=>"2019-05-07T18:57:51.904275088+00:00 stdout P multiline "}]
[0] [1638689838.836938022, {"log"=>"2019-05-07T18:57:52.904275089+00:00 stdout F log"}]

----- MULTILINE FLUSH -----
[0] [1557255470.904275087, {"time"=>"2019-05-07T18:57:50.904275087+00:00", "stream"=>"stdout", "_p"=>"P", "log"=>"1a. some multiline log"}]
----------- EOF -----------
[0] [1638689838.846153021, {"log"=>"2019-05-07T18:57:50.904275087+00:00 stderr P 1b. some "}]
[0] [1638689838.846240474, {"log"=>"2019-05-07T18:57:51.904275088+00:00 stderr P multiline "}]
[0] [1638689838.846423804, {"log"=>"2019-05-07T18:57:52.904275089+00:00 stderr F log"}]

----- MULTILINE FLUSH -----
[0] [1557255470.904275087, {"time"=>"2019-05-07T18:57:50.904275087+00:00", "stream"=>"stderr", "_p"=>"P", "log"=>"1b. some multiline log"}]
----------- EOF -----------
[0] [1638689838.846796467, {"log"=>"2019-05-07T18:57:53.904275090+00:00 stdout P 2a. another "}]
[0] [1638689838.846905299, {"log"=>"2019-05-07T18:57:54.904275091+00:00 stdout P multiline "}]
[0] [1638689838.846984857, {"log"=>"2019-05-07T18:57:55.904275092+00:00 stdout F log"}]

----- MULTILINE FLUSH -----
[0] [1557255473.904275090, {"time"=>"2019-05-07T18:57:53.904275090+00:00", "stream"=>"stdout", "_p"=>"P", "log"=>"2a. another multiline log"}]
----------- EOF -----------
[0] [1638689838.847200778, {"log"=>"2019-05-07T18:57:53.904275090+00:00 stderr P 2b. another "}]
[0] [1638689838.847275507, {"log"=>"2019-05-07T18:57:54.904275091+00:00 stderr P multiline "}]
[0] [1638689838.847352299, {"log"=>"2019-05-07T18:57:55.904275092+00:00 stderr F log"}]

----- MULTILINE FLUSH -----
[0] [1557255473.904275090, {"time"=>"2019-05-07T18:57:53.904275090+00:00", "stream"=>"stderr", "_p"=>"P", "log"=>"2b. another multiline log"}]
----------- EOF -----------
[0] [1638689838.847742624, {"log"=>"2019-05-07T18:57:56.904275093+00:00 stdout F 3a. non multiline 1"}]

----- MULTILINE FLUSH -----
[0] [1557255476.904275093, {"time"=>"2019-05-07T18:57:56.904275093+00:00", "stream"=>"stdout", "_p"=>"F", "log"=>"3a. non multiline 1"}]
----------- EOF -----------
[0] [1638689838.847959838, {"log"=>"2019-05-07T18:57:57.904275094+00:00 stdout F 4a. non multiline 2"}]

----- MULTILINE FLUSH -----
[0] [1557255477.904275094, {"time"=>"2019-05-07T18:57:57.904275094+00:00", "stream"=>"stdout", "_p"=>"F", "log"=>"4a. non multiline 2"}]
----------- EOF -----------
[0] [1638689838.848172984, {"log"=>"2019-05-07T18:57:56.904275093+00:00 stderr F 3b. non multiline 1"}]

----- MULTILINE FLUSH -----
[0] [1557255476.904275093, {"time"=>"2019-05-07T18:57:56.904275093+00:00", "stream"=>"stderr", "_p"=>"F", "log"=>"3b. non multiline 1"}]
----------- EOF -----------
[0] [1638689838.848435963, {"log"=>"2019-05-07T18:57:57.904275094+00:00 stderr F 4b. non multiline 2"}]

----- MULTILINE FLUSH -----
[0] [1557255477.904275094, {"time"=>"2019-05-07T18:57:57.904275094+00:00", "stream"=>"stderr", "_p"=>"F", "log"=>"4b. non multiline 2"}]
----------- EOF -----------
[ OK ]

It is the test case to send cri log as FLB_ML_TYPE_MAP.

Signed-off-by: Takahiro Yamashita <[email protected]>
@edsiper edsiper merged commit a98528e into fluent:master Dec 12, 2021
@nokute78 nokute78 deleted the fix_4034 branch December 12, 2021 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiline Parser: Built-in CRI parser does not work in multiline filter
2 participants