Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falied to flush the buffer, and the file buffer directory is filled with bad logs. #2534

Closed
liuchintao opened this issue Jul 31, 2019 · 5 comments
Labels
bug Something isn't working

Comments

@liuchintao
Copy link

Describe the bug

Hi, I want to use Fluentd collecting my kuberbetes working nodes' logs and send them to ElasticSearch. At first, every thing is ok, but there are some logs under fd-agent's file buffer directory with some characters that could not be encoded, I use cat to check these files on MobaXTerm just look like below:

▒▒]@▒(
9▒▒▒log▒[10.110.210.37 - - [31/Jul/2019:00:58:13 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.14"
▒stream▒stdout▒docker▒▒container_id▒@8e491a98cf9e675e6c513f4835d3325b62c68268490d74b2557ea649c484b50a▒kubernetes▒▒container_name▒registry▒namespace_name▒goharbor▒pod_name▒*cambricon-harbor-registry-85cf6bc75c-9whjz▒pod_id▒$0ef30ad9-b2ca-11e9-8051-00ff00e051ec▒labels▒▒app▒harbor▒chart▒harbor▒component▒registry▒heritage▒Tiller▒pod-template-hash▒85cf6bc75c▒release▒cambricon▒host▒mockvm-deploy6▒master_url▒https://10.233.0.1:443/api▒namespace_id▒$3c1e3039-b295-11e9-a14a-00ff002fbaf1▒▒]@▒8)˞▒▒log▒▒time="2019-07-31T00:58:16.941452121Z" level=info msg="response completed" go.version=go1.11.8 http.request.host="cambricon-harbor-registry:5000" http.request.id=af2d2f4b-806f-4d2b-80b2-86a6df06bf79 http.request.method=GET http.request.remoteaddr="10.233.79.210:41064" http.request.uri="/v2" http.request.useragent="Go-http-client/1.1" http.response.contenttype="text/html; charset=utf-8" http.response.duration="166.355µs" http.response.status=301 http.response.written=39
▒stream▒stderr▒docker▒▒container_id▒@8e491a98cf9e675e6c513f4835d3325b62c68268490d74b2557ea649c484b50a▒kubernetes▒▒container_name▒registry▒namespace_name▒goharbor▒pod_name▒*cambricon-harbor-registry-85cf6bc75c-9whjz▒pod_id▒$0ef30ad9-b2ca-11e9-8051-00ff00e051ec▒labels▒▒app▒harbor▒chart▒harbor▒component▒registry▒heritage▒Tiller▒pod-template-hash▒85cf6bc75c▒release▒cambricon▒host▒mockvm-deploy6▒master_url▒https://10.233.0.1:443/api▒namespace_id▒$3c1e3039-b295-11e9-a14a-00ff002fbaf1▒▒]@▒8*a˄▒log▒a10.233.79.210 - - [31/Jul/2019:00:58:16 +0000] "GET /v2 HTTP/1.1" 301 39 "" "Go-http-client/1.1"

So, the Fluentd could not send them to ES.

I am under the impression that Fluentd could move these bad chunks to tmp directory, but now it does not work. A large number of unprocessed logs take up a lot of disk space.

To Reproduce

  1. Before this issue arises, I allocated fluentd-pod less memory. I just want to know how less memory would lead to be killed as OOM error.
  2. Then I found unexpected error while checking flushed chunks issue before fluentd pod restarting because of OOM.
  3. Though I do not found the reason of OOM and flush chunks error, I decide to reallocate normal memory to fd pod. In this step, I have 5 fluentd pods and 2 of them were OOMkilled and restart several times.
  4. After reallocating, I found that there are a lot of failed to flush the buffer errors logged by the two OOMkilled fluentd pods before.

Expected behavior

I expect these bad chunks will be moved to tmp directory, they take up nearly 920M in buffer dictory.

Your Environment

-fluentd 1.2.4

Your Configuration

My fluentd configuration and pod yaml like https://github.com/kubernetes/kubernetes/blob/a5fcaa87f6bd06af87c9e10eff2a8c539f571cf8/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml and https://github.com/kubernetes/kubernetes/blob/a5fcaa87f6bd06af87c9e10eff2a8c539f571cf8/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml

I just change output section as below:

<match **>
  @id elasticsearch
  @type elasticsearch
  @log_level info
  type_name flatten
  include_tag_key true
  host elasticsearch-logging
  port 9200
  logstash_format true
  logstash_prefix flatten-only
  flatten_hashes true
  flatten_hashes_separator _
  #index_name kubernetes-%Y%m%d-1
  <buffer>
    @type file
    path /var/log/fluentd-buffers/kubernetes.system.buffer
    flush_mode interval
    timekey 1h
    retry_type exponential_backoff
    flush_thread_count 2
    flush_interval 5s
    retry_forever
    retry_max_interval 30
    chunk_limit_size 2M
    queue_limit_length 100
    overflow_action block
  </buffer>
</match>

Your Error Log

<!-- Write your **ALL** error log here -->
2019-07-31 02:51:17 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:17 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:17 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:17 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38cd67bb400ab195586623c6b93"
2019-07-31 02:51:18 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:19 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:18 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:19 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:19 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:19 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:19 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38cd6f647e603c31038153d90e0"
2019-07-31 02:51:19 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:20 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:19 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:20 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:20 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:20 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:20 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38dd07c9794e2cc96c3d55dfc61"
2019-07-31 02:51:21 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:22 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:21 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:21 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38ebec1aa6bfdc88ca64100fbcf"
2019-07-31 02:51:21 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:22 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:21 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:22 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:22 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:22 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:22 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec38fb43d3eb8571a85d44133c093"
2019-07-31 02:51:23 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:24 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:23 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:24 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:24 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:24 +0000 [warn]: suppressed same stacktrace
2019-07-31 02:51:24 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="58eec390a6cad79eae20412a5932d101"
2019-07-31 02:51:25 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-31 02:51:26 +0000 chunk="58ee3cd71335902ed3a8922a6addc100" error_class=MessagePack::UnknownExtTypeError error="unexpected extension type"
  2019-07-31 02:51:25 +0000 [warn]: suppressed same stacktrace

Additional context

There are thousands of *.log and *.log.meta files in fluentd file buffer directory.

@liuchintao liuchintao added the bug Something isn't working label Jul 31, 2019
@repeatedly
Copy link
Member

I expect these bad chunks will be moved to tmp directory, they take up nearly 920M in buffer dictory.

This is already resolved since v1.4.0

https://github.com/fluent/fluentd/blob/master/CHANGELOG.md#enhancements-2
#2301

@liuchintao
Copy link
Author

@repeatedly Thanks reply!

So do I just need to upgrade the fluentd version to 1.4.0 and add the secondary section?

@liuchintao
Copy link
Author

@repeatedly Hi, thanks for you suggestion, after adding <secondary> in my buffer configuration, the fd works well. But I did not find secondary file's directory.
This is my configuration:

    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      type_name flatten
      include_tag_key true
      host elasticsearch-logging
      port 9200
      logstash_format true
      logstash_prefix flatten-only-secondary
      flatten_hashes true
      flatten_hashes_separator _
      #index_name kubernetes-%Y%m%d-1
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        timekey 1h
        retry_type exponential_backoff
        flush_thread_count 4
        flush_interval 5s
        #retry_forever
        retry_timeout 1h
        retry_max_interval 30
        chunk_limit_size 2M
        queue_limit_length 100
        overflow_action block
      </buffer>
      # Secondary
      <secondary>
        @type secondary_file
        directory /tmp/fluentd
        basename bad-chunk-${chunk_id}
      </secondary>
    </match>

@liuchintao
Copy link
Author

@repeatedly hi, after adding secoondary tag, the logs that could not be pushed to elasticsearch are moved to secondary path, but could I re-push these logs to elasticsearch some times manually?

@kenhys
Copy link
Contributor

kenhys commented Nov 12, 2020

@liuchintao

Maybe you can do such a thing by:

$ cat PATH_TO_SECONDARY | fluent-cat --msgpack --host HOST --port PORT

@kenhys kenhys closed this as completed Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants