in_tail throws error and crashes process #3327

davidbhoward · 2021-04-12T23:22:55Z

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
We are seeing an exception being thrown while Fluentd is starting up, which is causing Fluentd process to crash. We suspect these are caused by short-lived, often run, K8s CronJobs where the Docker logs no longer exists, but the symlinks are still there and end up causing in_tail to pick it up and throw this error. We were trying to fix the in_tail bugs discussed in #3239, which is why we are specifically trying to deploy this version.

To Reproduce
Have several CronJobs that run every 1 min on a node and only live for a very short duration.

Expected behavior
Fluentd would continue to run if the in_tail file doesn't exist and this condition is hit that causes this error or bug is fixed.

Your Environment

Fluentd or td-agent version: v1.12.2.
Note: We are building and installing from source following this guide. We have been doing so before this release without issues.
Ruby: 2.7.2
Operating system: CentOS 7
Kernel version: 4.15.0-70-generic

If you hit the problem with older fluentd version, try latest version first.
This happens on the latest version of Fluentd.

Your Configuration

    <source>
      @type tail
      @log_level debug
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      follow_inodes true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

Your Error Log

2021-04-12 16:00:21 -0700 [warn]: /var/log/containers/obfuscated_container_xxx_1.log not found. Continuing without tailing it.
2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now.
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:359:in `refresh_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:234:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:200:in `block in start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:189:in `block (2 levels) in lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `each'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `block in lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `each'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:199:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:248:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:147:in `run'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:700:in `block in run_worker'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:951:in `main_process'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:691:in `run_worker'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/command/fluentd.rb:365:in `<top (required)>'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/bin/fluentd:8:in `<top (required)>'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `load'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `<main>'
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
  2021-04-12 16:00:21 -0700 [error]: suppressed same stacktrace

Additional context

We have hundreds of Fluentd instances running and there is a single node that seems to be hitting this problem. While looking at the node, I suspect it is caused by many short-lived, often run (every 1 min), CronJobs where the log files don't exist because the containers are now gone but symlinks still exist. Please let me know if there is more information I can provide here to help troubleshoot this issue.

The text was updated successfully, but these errors were encountered:

davidbhoward · 2021-04-13T00:06:25Z

After looking at this more, this has been happening since earlier this morning, which is longer than we deployed v1.12.2, so I don't think this is unique to this latest version. I do believe though that many short-lived jobs will cause this error to happen. The mitigation here is to use the exclude_path for the in_tail plugin and just not input any short-lived, often run, CronJobs.

ashie · 2021-04-13T01:02:55Z

Thank you for you report!
This crash is definitely from v1.12.2, by #3275 (d066aa2)

ashie · 2021-04-13T01:10:30Z

After looking at this more, this has been happening since earlier this morning, which is longer than we deployed v1.12.2, so I don't think this is unique to this latest version.

If your previous version is 1.12.0 or 1.12.1, probably you saw a different bug #3274.

It may occur on catching a short-lived log (#3327). Signed-off-by: Takuro Ashie <[email protected]>

This commit fixes the following error reported at #3327, it will occur when a file is removed after calling setup_watcher and before registering the created watcher. 2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now. 2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean? each_slice" 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value' 2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers' Signed-off-by: Takuro Ashie <[email protected]>

davidbhoward · 2021-04-13T16:17:04Z

Awesome. Thank you, @ashie, for the quick turn-around here. Much appreciated.

davidbhoward changed the title ~~v1.12.2 in_tail throws error and crashes process~~ in_tail throws error and crashes process Apr 12, 2021

ashie added the bug Something isn't working label Apr 13, 2021

ashie added a commit that referenced this issue Apr 13, 2021

in_tail: Fix an incorrect error handling

9925845

It may occur on catching a short-lived log (#3327). Signed-off-by: Takuro Ashie <[email protected]>

This was referenced Apr 13, 2021

in_tail: Fix an incorrect error handling on catching a short-lived log #3328

Merged

Release 4.1.1 fluent/fluent-package-builder#288

Merged

ashie closed this as completed in #3328 Apr 13, 2021

This was referenced Apr 14, 2021

Tail input plugin doesn't respect SIGTERM signal #3323

Closed

Add log throttling per file (revised) #3185

Merged

This was referenced Apr 23, 2021

FluentD errors on a short living pod and seems to lose logging at the error event #3348

Closed

New k8s container logs are not getting picked up by in_tail plugin #3355

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in_tail throws error and crashes process #3327

in_tail throws error and crashes process #3327

davidbhoward commented Apr 12, 2021 •

edited

Loading

davidbhoward commented Apr 13, 2021 •

edited

Loading

ashie commented Apr 13, 2021 •

edited

Loading

ashie commented Apr 13, 2021

davidbhoward commented Apr 13, 2021

in_tail throws error and crashes process #3327

in_tail throws error and crashes process #3327

Comments

davidbhoward commented Apr 12, 2021 • edited Loading

davidbhoward commented Apr 13, 2021 • edited Loading

ashie commented Apr 13, 2021 • edited Loading

ashie commented Apr 13, 2021

davidbhoward commented Apr 13, 2021

davidbhoward commented Apr 12, 2021 •

edited

Loading

davidbhoward commented Apr 13, 2021 •

edited

Loading

ashie commented Apr 13, 2021 •

edited

Loading