Falied to flush the buffer, and the file buffer directory is filled with bad logs. #2534

liuchintao · 2019-07-31T03:50:04Z

Describe the bug

Hi, I want to use Fluentd collecting my kuberbetes working nodes' logs and send them to ElasticSearch. At first, every thing is ok, but there are some logs under fd-agent's file buffer directory with some characters that could not be encoded, I use cat to check these files on MobaXTerm just look like below:

▒▒]@▒(
9▒▒▒log▒[10.110.210.37 - - [31/Jul/2019:00:58:13 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.14"
▒stream▒stdout▒docker▒▒container_id▒@8e491a98cf9e675e6c513f4835d3325b62c68268490d74b2557ea649c484b50a▒kubernetes▒▒container_name▒registry▒namespace_name▒goharbor▒pod_name▒*cambricon-harbor-registry-85cf6bc75c-9whjz▒pod_id▒$0ef30ad9-b2ca-11e9-8051-00ff00e051ec▒labels▒▒app▒harbor▒chart▒harbor▒component▒registry▒heritage▒Tiller▒pod-template-hash▒85cf6bc75c▒release▒cambricon▒host▒mockvm-deploy6▒master_url▒https://10.233.0.1:443/api▒namespace_id▒$3c1e3039-b295-11e9-a14a-00ff002fbaf1▒▒]@▒8)˞▒▒log▒▒time="2019-07-31T00:58:16.941452121Z" level=info msg="response completed" go.version=go1.11.8 http.request.host="cambricon-harbor-registry:5000" http.request.id=af2d2f4b-806f-4d2b-80b2-86a6df06bf79 http.request.method=GET http.request.remoteaddr="10.233.79.210:41064" http.request.uri="/v2" http.request.useragent="Go-http-client/1.1" http.response.contenttype="text/html; charset=utf-8" http.response.duration="166.355µs" http.response.status=301 http.response.written=39
▒stream▒stderr▒docker▒▒container_id▒@8e491a98cf9e675e6c513f4835d3325b62c68268490d74b2557ea649c484b50a▒kubernetes▒▒container_name▒registry▒namespace_name▒goharbor▒pod_name▒*cambricon-harbor-registry-85cf6bc75c-9whjz▒pod_id▒$0ef30ad9-b2ca-11e9-8051-00ff00e051ec▒labels▒▒app▒harbor▒chart▒harbor▒component▒registry▒heritage▒Tiller▒pod-template-hash▒85cf6bc75c▒release▒cambricon▒host▒mockvm-deploy6▒master_url▒https://10.233.0.1:443/api▒namespace_id▒$3c1e3039-b295-11e9-a14a-00ff002fbaf1▒▒]@▒8*a˄▒log▒a10.233.79.210 - - [31/Jul/2019:00:58:16 +0000] "GET /v2 HTTP/1.1" 301 39 "" "Go-http-client/1.1"

So, the Fluentd could not send them to ES.

I am under the impression that Fluentd could move these bad chunks to tmp directory, but now it does not work. A large number of unprocessed logs take up a lot of disk space.

To Reproduce

Before this issue arises, I allocated fluentd-pod less memory. I just want to know how less memory would lead to be killed as OOM error.
Then I found unexpected error while checking flushed chunks issue before fluentd pod restarting because of OOM.
Though I do not found the reason of OOM and flush chunks error, I decide to reallocate normal memory to fd pod. In this step, I have 5 fluentd pods and 2 of them were OOMkilled and restart several times.
After reallocating, I found that there are a lot of failed to flush the buffer errors logged by the two OOMkilled fluentd pods before.

Expected behavior

I expect these bad chunks will be moved to tmp directory, they take up nearly 920M in buffer dictory.

Your Environment

-fluentd 1.2.4

Your Configuration

My fluentd configuration and pod yaml like https://github.com/kubernetes/kubernetes/blob/a5fcaa87f6bd06af87c9e10eff2a8c539f571cf8/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml and https://github.com/kubernetes/kubernetes/blob/a5fcaa87f6bd06af87c9e10eff2a8c539f571cf8/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml

I just change output section as below:

<match **>
  @id elasticsearch
  @type elasticsearch
  @log_level info
  type_name flatten
  include_tag_key true
  host elasticsearch-logging
  port 9200
  logstash_format true
  logstash_prefix flatten-only
  flatten_hashes true
  flatten_hashes_separator _
  #index_name kubernetes-%Y%m%d-1
  <buffer>
    @type file
    path /var/log/fluentd-buffers/kubernetes.system.buffer
    flush_mode interval
    timekey 1h
    retry_type exponential_backoff
    flush_thread_count 2
    flush_interval 5s
    retry_forever
    retry_max_interval 30
    chunk_limit_size 2M
    queue_limit_length 100
    overflow_action block
  </buffer>
</match>

Your Error Log

<!-- Write your **ALL** error log here -->
2019-07-31 02:51:17 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:17 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:17 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:17 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38cd67bb400ab195586623c6b93"
2019-07-31 02:51:18 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:19 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:18 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:19 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:19 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:19 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:19 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38cd6f647e603c31038153d90e0"
2019-07-31 02:51:19 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:20 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:19 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:20 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:20 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:20 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:20 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38dd07c9794e2cc96c3d55dfc61"
2019-07-31 02:51:21 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:22 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:21 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:21 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38ebec1aa6bfdc88ca64100fbcf"
2019-07-31 02:51:21 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:22 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:21 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:22 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:22 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:22 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:22 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38fb43d3eb8571a85d44133c093"
2019-07-31 02:51:23 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:24 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:23 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:24 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:24 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:24 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:24 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec390a6cad79eae20412a5932d101"
2019-07-31 02:51:25 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:26 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:25 +0000 [warn]: suppressed same stacktrace

Additional context

There are thousands of *.log and *.log.meta files in fluentd file buffer directory.

The text was updated successfully, but these errors were encountered:

repeatedly · 2019-07-31T04:32:04Z

I expect these bad chunks will be moved to tmp directory, they take up nearly 920M in buffer dictory.

This is already resolved since v1.4.0

https://github.com/fluent/fluentd/blob/master/CHANGELOG.md#enhancements-2
#2301

liuchintao · 2019-07-31T07:07:10Z

@repeatedly Thanks reply！

So do I just need to upgrade the fluentd version to 1.4.0 and add the secondary section?

liuchintao · 2019-08-14T06:40:50Z

@repeatedly Hi, thanks for you suggestion, after adding <secondary> in my buffer configuration, the fd works well. But I did not find secondary file's directory.
This is my configuration:

    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      type_name flatten
      include_tag_key true
      host elasticsearch-logging
      port 9200
      logstash_format true
      logstash_prefix flatten-only-secondary
      flatten_hashes true
      flatten_hashes_separator _
      #index_name kubernetes-%Y%m%d-1
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        timekey 1h
        retry_type exponential_backoff
        flush_thread_count 4
        flush_interval 5s
        #retry_forever
        retry_timeout 1h
        retry_max_interval 30
        chunk_limit_size 2M
        queue_limit_length 100
        overflow_action block
      </buffer>
      # Secondary
      <secondary>
        @type secondary_file
        directory /tmp/fluentd
        basename bad-chunk-${chunk_id}
      </secondary>
    </match>

liuchintao · 2019-10-08T04:30:00Z

@repeatedly hi, after adding secoondary tag, the logs that could not be pushed to elasticsearch are moved to secondary path, but could I re-push these logs to elasticsearch some times manually?

kenhys · 2020-11-12T07:42:21Z

@liuchintao

Maybe you can do such a thing by:

$ cat PATH_TO_SECONDARY | fluent-cat --msgpack --host HOST --port PORT

liuchintao added the bug Something isn't working label Jul 31, 2019

liuchintao mentioned this issue Aug 14, 2019

Fluentd process looks like working well but do not push logs to Elasticsearch after the buffer blocked uken/fluent-plugin-elasticsearch#625

Closed

kenhys closed this as completed Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falied to flush the buffer, and the file buffer directory is filled with bad logs. #2534

Falied to flush the buffer, and the file buffer directory is filled with bad logs. #2534

liuchintao commented Jul 31, 2019

repeatedly commented Jul 31, 2019

liuchintao commented Jul 31, 2019

liuchintao commented Aug 14, 2019

liuchintao commented Oct 8, 2019

kenhys commented Nov 12, 2020

Falied to flush the buffer, and the file buffer directory is filled with bad logs. #2534

Falied to flush the buffer, and the file buffer directory is filled with bad logs. #2534

Comments

liuchintao commented Jul 31, 2019

repeatedly commented Jul 31, 2019

liuchintao commented Jul 31, 2019

liuchintao commented Aug 14, 2019

liuchintao commented Oct 8, 2019

kenhys commented Nov 12, 2020