Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin dont process objects correctly, dont delete or backup #240

Open
dabelousov opened this issue Mar 11, 2022 · 12 comments
Open

Plugin dont process objects correctly, dont delete or backup #240

dabelousov opened this issue Mar 11, 2022 · 12 comments
Labels

Comments

@dabelousov
Copy link

  1. Logstash -oss version 7.16-8.1
  2. Docker
  3. K8s - Openshift 4.7
  4. Included in image

Hello. Trouble in S3 input plugin with private S3 like AWS Minio.
Logstash normally read object and send to output, but backup or delete is not working.
Object staying in source bucket with no changes, objects are small json access log files, average size is 1-2 kB.

Input config:
   input {
      s3 {
        access_key_id => "${S3_ACCESS_KEY}"
        secret_access_key => "${S3_SECRET_KEY}"
        endpoint => {{ $.Values.s3_connect_endpoint | quote }}
        bucket => "test-bucket"
        prefix => "prefix"
        backup_to_bucket => "backup-bucket"
        backup_add_prefix => "processed"
        delete => true
      }
    }

IAM role is allowed to any actions, checked that by delete object with mcli tool.
In S3 access logs i see only success (200) GET and HEAD, and no one PUT, POST or DELETE.
In logstash log i see only success logs like below

{"level":"INFO","loggerName":"logstash.inputs.s3","timeMillis":1646814827669,"thread":"[main]<s3","logEvent":{"message":"epaas-caasv3-backups/2022-03-05-09-20-02-312 is updated at 2022-03-05 06:20:02 +0000 and will process in the next cycle"}}

{"level":"INFO","loggerName":"logstash.inputs.s3","timeMillis":1646814827800,"thread":"[main]<s3","logEvent":{"message":"epaas-caasv3-backups/2022-03-05-09-20-02-396 is updated at 2022-03-05 06:20:02 +0000 and will process in the next cycle"}}

{"level":"INFO","loggerName":"logstash.inputs.s3","timeMillis":1646814827932,"thread":"[main]<s3","logEvent":{"message":"epaas-caasv3-backups/2022-03-05-09-20-03-185 is updated at 2022-03-05 06:20:03 +0000 and will process in the next cycle"}}
33

Found some interesting code
https://github.com/logstash-plugins/logstash-input-s3/blob/main/lib/logstash/inputs/s3.rb#L383

As i understand - plugin compare last_modified of object and log, and according to my log - postpone object processing to next cycle, and after default 60 seconds it repeating again.

Also trying to set sincedb_path => "/tmp/logstash/since.db" , but it is not creating.
Objects from bucket downloaded in /tmp/logstash/ and staying there.

@dabelousov dabelousov added the bug label Mar 11, 2022
dabelousov pushed a commit to dabelousov/logstash-input-s3 that referenced this issue Mar 16, 2022
@dabelousov
Copy link
Author

@pebosi
Copy link

pebosi commented May 10, 2022

Same problem here, using logstash 8.2.0 docker image. Switched to fork...

@Derekt2
Copy link

Derekt2 commented Feb 20, 2023

Same here, switched to fork.

@lysenkojito
Copy link

@kaisecheng any ideas why it happens?

@kaisecheng
Copy link
Contributor

The reason for comparing the last modified time of object and log is to confirm the object is not updated since the list action. If the object gets updated, its last modified time will bring it to the next cycle. Deleting the comparison leads to duplication/ reprocessing of ingested data.

Also trying to set sincedb_path => "/tmp/logstash/since.db" , but it is not creating.

The plugin can't work properly without sincedb. Maybe the Logstash user lack of permission to write in the path?
Enabling debug log should give some hints

@lysenkojito
Copy link

The reason for comparing the last modified time of object and log is to confirm the object is not updated since the list action. If the object gets updated, its last modified time will bring it to the next cycle. Deleting the comparison leads to duplication/ reprocessing of ingested data.

Also trying to set sincedb_path => "/tmp/logstash/since.db" , but it is not creating.

The plugin can't work properly without sincedb. Maybe the Logstash user lack of permission to write in the path? Enabling debug log should give some hints

we use minio s3 bucket with admin s3:* permissions. Logstash reads logs good, but repeats reading them all the time

@kaisecheng
Copy link
Contributor

but repeats reading them all the time

It sounds like the plugin has an issue updating the sincedb. To compare object timestamps, Logstash needs to write the last modified time to sincedb, otherwise, the objects are reprocessed in the next cycle. Please check if Logstash is able to write to sincedb_path and if the file (sincedb) is updated successfully.

@lysenkojito
Copy link

but repeats reading them all the time

It sounds like the plugin has an issue updating the sincedb. To compare object timestamps, Logstash needs to write the last modified time to sincedb, otherwise, the objects are reprocessed in the next cycle. Please check if Logstash is able to write to sincedb_path and if the file (sincedb) is updated successfully.

should I write smth to sincedb_path? and how to check if Logstash is able to write to sincedb_path?

@lysenkojito
Copy link

but repeats reading them all the time

It sounds like the plugin has an issue updating the sincedb. To compare object timestamps, Logstash needs to write the last modified time to sincedb, otherwise, the objects are reprocessed in the next cycle. Please check if Logstash is able to write to sincedb_path and if the file (sincedb) is updated successfully.

I tried to run simultaneously two pipelines: one using aws s3 bucket, another one - minio s3 bucket. In both cases I found no errors in debug mode.

There was written that both pipelines have default sincedb file created, BUT there was only one existed at the mentioned path - for aws bucket.

It’s not local filesystem permissions, not minio permissions (because we use admin credentials). There is a lack of logs to understand why it happened.

Please advice how to debug and fix it.

@kaisecheng

@kaisecheng
Copy link
Contributor

@lysenkojito
The permission I refer to is the user running Logstash should have enough privilege to write on disk in sincedb_path. Taking docker environment as an example, the default user is logstash.

  1. Make sure logstash user can read and write the path sincedb_path
  2. Make sure each s3-input has unique sincedb_path (this setting must be a filename path and not just a directory)

BUT there was only one existed at the mentioned path - for aws bucket.

Are you setting the same sincedb path in both pipelines? If paths are unique, I would expect to see error in log for minio s3. The best path forward for you is to create a new issue including a reproducer with debug log, config and pipelines for further investigation if you believe it is a bug. We support AWS s3 officially. The help for minio s3 will be limited.

@lysenkojito
Copy link

@lysenkojito The permission I refer to is the user running Logstash should have enough privilege to write on disk in sincedb_path. Taking docker environment as an example, the default user is logstash.

  1. Make sure logstash user can read and write the path sincedb_path
  2. Make sure each s3-input has unique sincedb_path (this setting must be a filename path and not just a directory)

BUT there was only one existed at the mentioned path - for aws bucket.

Are you setting the same sincedb path in both pipelines? If paths are unique, I would expect to see error in log for minio s3. The best path forward for you is to create a new issue including a reproducer with debug log, config and pipelines for further investigation if you believe it is a bug. We support AWS s3 officially. The help for minio s3 will be limited.

@kaisecheng
Sincedb paths were set by default. They had different names, but one folder -…/s3/

It’s definitely not permissions issue.
okay, I’ll create an issue. Thank you

@volter
Copy link

volter commented Aug 2, 2024

In my setup with minio, I found that the problem to be that the compared timestamps are not exactly equal. One of them is 12345678.0, the other one 12345678.863. I don't fully understand where these timestamps are coming from, thus I don't know if precision matters.

This is a systematic problem with these two bits of information and the code never takes the turn into creating a sincedb entry or deleting objects. As a consequence Logstash is looping madly, consuming a lot of CPU time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants