-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in_tail throws error and crashes process #3327
Comments
After looking at this more, this has been happening since earlier this morning, which is longer than we deployed v1.12.2, so I don't think this is unique to this latest version. I do believe though that many short-lived jobs will cause this error to happen. The mitigation here is to use the exclude_path for the in_tail plugin and just not input any short-lived, often run, CronJobs. |
If your previous version is 1.12.0 or 1.12.1, probably you saw a different bug #3274. |
It may occur on catching a short-lived log (#3327). Signed-off-by: Takuro Ashie <[email protected]>
This commit fixes the following error reported at #3327, it will occur when a file is removed after calling setup_watcher and before registering the created watcher. 2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now. 2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean? each_slice" 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers' Signed-off-by: Takuro Ashie <[email protected]>
This commit fixes the following error reported at #3327, it will occur when a file is removed after calling setup_watcher and before registering the created watcher. 2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now. 2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean? each_slice" 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers' Signed-off-by: Takuro Ashie <[email protected]>
Awesome. Thank you, @ashie, for the quick turn-around here. Much appreciated. |
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
We are seeing an exception being thrown while Fluentd is starting up, which is causing Fluentd process to crash. We suspect these are caused by short-lived, often run, K8s CronJobs where the Docker logs no longer exists, but the symlinks are still there and end up causing in_tail to pick it up and throw this error. We were trying to fix the in_tail bugs discussed in #3239, which is why we are specifically trying to deploy this version.
To Reproduce
Have several CronJobs that run every 1 min on a node and only live for a very short duration.
Expected behavior
Fluentd would continue to run if the in_tail file doesn't exist and this condition is hit that causes this error or bug is fixed.
Your Environment
v1.12.2
.Note: We are building and installing from source following this guide. We have been doing so before this release without issues.
2.7.2
CentOS 7
4.15.0-70-generic
If you hit the problem with older fluentd version, try latest version first.
This happens on the latest version of Fluentd.
Your Configuration
Your Error Log
Additional context
We have hundreds of Fluentd instances running and there is a single node that seems to be hitting this problem. While looking at the node, I suspect it is caused by many short-lived, often run (every 1 min), CronJobs where the log files don't exist because the containers are now gone but symlinks still exist. Please let me know if there is more information I can provide here to help troubleshoot this issue.
The text was updated successfully, but these errors were encountered: