FluentD errors on a short living pod and seems to lose logging at the error event #3348

Mosibi · 2021-04-23T09:32:36Z

Describe the bug
We test our logging framework after each installation of a cluster (nightly) and often we see that logs for some pods are missing and when that happens, it is often after this error message:

2021-04-23 03:35:55 +0000 [warn]: stat() for /var/log/containers/es-query-nwbx8_test_es-query-4c8788baaa3785e06488d809eeee67a9ab459d9534059a76b402a807651747bc.log failed with ENOENT. Drop tail watcher for now.
2021-04-23 03:35:55.936433049 +0000 fluent.warn: {"message":"stat() for /var/log/containers/es-query-nwbx8_test_es-query-4c8788baaa3785e06488d809eeee67a9ab459d9534059a76b402a807651747bc.log failed with ENOENT. Drop tail watcher for now."}
2021-04-23 03:35:55 +0000 [error]: [in_tail_container_logs] Unexpected error raised. Stopping the timer. title=:in_tail_refresh_watchers error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00007f7ebd3385b8>\nDid you mean?  each_slice"
2021-04-23 03:35:55.936735140 +0000 fluent.error: {"title":"in_tail_refresh_watchers","error":"#<NoMethodError: undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00007f7ebd3385b8>\nDid you mean?  each_slice>","message":"[in_tail_container_logs] Unexpected error raised. Stopping the timer. title=:in_tail_refresh_watchers error_class=NoMethodError error=\"undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00007f7ebd3385b8>\\nDid you mean?  each_slice\""}
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:359:in `refresh_watchers'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run_once'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2021-04-23 03:35:55 +0000 [error]: /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2021-04-23 03:35:55 +0000 [error]: [in_tail_container_logs] Timer detached. title=:in_tail_refresh_watchers
2021-04-23 03:35:55.938884400 +0000 fluent.error: {"title":"in_tail_refresh_watchers","message":"[in_tail_container_logs] Timer detached. title=:in_tail_refresh_watchers"}

When this happens, logs from a job/pod which has run before this event, is not present in Elasticsearch, so it seems that FluentD is still processing those and stops when this error occurs

To Reproduce
Run a job which generates logging and when that job is done, run a job (pod) that checks if the generated logging is present in Elasticsearch

Expected behavior
No loss of logging events

Your Environment

# fluentd --version
fluentd 1.12.2

The container is running within kubernetes, running on RHEL 8.2

The text was updated successfully, but these errors were encountered:

ashie · 2021-04-23T09:40:06Z

It seems same issue with #3327, should be fixed by v1.12.3.
I've released it just a while ago.
Please try it.

ashie · 2021-04-23T09:46:27Z

I've released it just a while ago.

Docker image isn't released yet though.

Mosibi · 2021-04-23T11:45:01Z

@ashie Thanks, i've just build our own version of the container and coming night we will do a full test with it.

ashie closed this as completed Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FluentD errors on a short living pod and seems to lose logging at the error event #3348

FluentD errors on a short living pod and seems to lose logging at the error event #3348

Mosibi commented Apr 23, 2021

ashie commented Apr 23, 2021

ashie commented Apr 23, 2021 •

edited

Loading

Mosibi commented Apr 23, 2021

FluentD errors on a short living pod and seems to lose logging at the error event #3348

FluentD errors on a short living pod and seems to lose logging at the error event #3348

Comments

Mosibi commented Apr 23, 2021

ashie commented Apr 23, 2021

ashie commented Apr 23, 2021 • edited Loading

Mosibi commented Apr 23, 2021

ashie commented Apr 23, 2021 •

edited

Loading