-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Journald position file got corrupted during multiple fluentd gracefulReloads #3332
Comments
I spent a few hours trying to reproduce this issue this morning, but I couldn't. I set up the following configuration: <system>
log_level error
rpc_endpoint "127.0.0.1:24444"
</system>
<source>
@type systemd
@id in_systemd_docker
path "/var/log/journal"
tag "systemd.unit"
read_from_head false
<storage>
@type "local"
persistent true
path "kube-fluentd-operator--fluentd-journald-cursor.json"
</storage>
<entry>
field_map {"_SYSTEMD_UNIT":"unit","MESSAGE":"log","_PID":["pid"],"_PRIORITY":"priority","_COMM":"cmd","_HOSTNAME":"hostname"}
field_map_strict true
fields_lowercase true
</entry>
</source>
<match systemd.**>
@type stdout
</match> and let the following bash script running for a while :
I reloaded Fluentd 1000 times in total, but did not see any file corruption issue like the I'll try to check if reloading another 1000 times reproduces the issue, but with no |
It occured to me that it might be concurrency issue. So I set up two
It turned out that Journald plugin is indeed racy on gracefulReload.
#3335 is my fix for the race issue. It resolves the issue by chaniging While I did not reproduce the exact error of @alex-vmw's report, but I |
Describe the bug
We were performing tests that were issuing multiple /api/config.gracefulReload calls to fluentd. After a few reloads we noticed that on a single node the journald position file got corrupted.
Error that we saw:
What we saw in the corrupted file was this:
As you can see, the
{}
in front was supposed to be{"
instead.To Reproduce
Restart fluentd many times via /api/config.gracefulReload call.
Expected behavior
journald position file shouldn't get corrupted with an invalid character.
Your Environment
fluentd 1.11.2
Ubuntu 20.04.1 LTS
5.4.0-62-generic
If you hit the problem with older fluentd version, try latest version first.
We tested /api/config.gracefulReload calls with different fluentd versions, but only hit this issue once, so I am guessing it is a pretty rare race condition. We decided to report it anyways, just in case.
Your Configuration
Your Error Log
The text was updated successfully, but these errors were encountered: