Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miss files due to the same last modified time #191

Open
brucezhao11 opened this issue Oct 28, 2019 · 2 comments
Open

Miss files due to the same last modified time #191

brucezhao11 opened this issue Oct 28, 2019 · 2 comments

Comments

@brucezhao11
Copy link

brucezhao11 commented Oct 28, 2019

We're using logstash s3 input plugin to consume files from Spark output. As Spark writes files concurrently, many files have the same last modified time. While S3 does not guarantee strong consistency, logstash may only list some of them. Then some files may be missed to process.

S3 Description:
A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.

File List:
2019-10-27 02:56:03 214283086 part-00000-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:56:03 214282388 part-00001-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:55:59 213951314 part-00002-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:56:03 214436993 part-00003-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:56:03 214117584 part-00004-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:55:59 214373123 part-00005-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:56:03 214342724 part-00006-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:56:03 214619587 part-00007-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:55:59 214146139 part-00008-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:56:03 214505891 part-00009-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:56:03 214004818 part-00010-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json
2019-10-27 02:55:59 214139449 part-00011-2b1fd6a2-1eb5-4ce4-8f01-3541482c6d4a-c000.json

@jasonpepper
Copy link

I think this is a duplicate of issue #57

@hard-working-boy
Copy link

hard-working-boy commented Mar 6, 2020

#57
set sincedb_path to /dev/null to ignore mtime.
@jasonpepper add the sincedb_disabled property not accepted . Is the pull request not accepted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants