Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual number of HEAD requests being made by S3 input plugin #47

Open
vanga opened this issue Jun 16, 2015 · 2 comments
Open

Unusual number of HEAD requests being made by S3 input plugin #47

vanga opened this issue Jun 16, 2015 · 2 comments

Comments

@vanga
Copy link

vanga commented Jun 16, 2015

We have S3 access logs being collected in a bucket. We are using S3 input plugin to index these files into ELK.

After a couple of months usage we noticed unusual no of requests made to S3 (~1 Billion/Month) which costs $440, this is only the charge for the no of requests which is negligible for most of the use cases, and no one even bothers about this cost.

When I looked at the billing reports, there were around 950 Million HEAD reqeusts made to the bucket which has these logs.
S3 input plugin must be making all these requests (file watching?)

I am not sure if there is any need to do some kind of optimization on the plugin part.
I think the logs that people store in S3 don't change over time(my assumption), so if a file is indexed already, then there is no need to watch that.

From user perspective, the options I can think of, to avoid these requests are

  1. Move the files to different location after the indexing is done
  2. Download the files to local drive using a cron job and use file input plugin to index to ES
  3. Use daily prefixes, so that plugin watches only those files, log files are named with timestamps
  4. Change the default interval to something higher if having some delay is fine, S3 access logs are hourly generated, so there is an hour delay anyway.

Any opinions and suggestions are welcome.

Thanks

@enVolt
Copy link

enVolt commented Jul 4, 2019

@vanga it's a very old thread, but did you figure out any solution for this. I'm in the same situation as of now. Specifically feeding Load Balancer logs.

I'm thinking of implementing point no. 3, but that comes at the cost of no real time logs.

@vanga
Copy link
Author

vanga commented Jul 6, 2019

I went with 1st option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants