Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat can be asked for a listing of files that can be safely deleted #42278

Open
sleekweasel opened this issue Jan 9, 2025 · 2 comments
Open
Labels
needs_team Indicates that the issue/PR needs a Team:* label

Comments

@sleekweasel
Copy link

sleekweasel commented Jan 9, 2025

Describe the enhancement:

There should be a means to ask Filebeat for a list of which files have been fully uploaded (or are too old to be uploaded), so a delete-dead-files script doesn't have to know the registry format or otherwise duplicate information already known by filebeat.

A format simple enough to be parsed by shell would seem ideal - line-by-line records of key=value fields none of which are allowed to contain spaces, followed by file= as the final field to end-of-line that can include spaces, quotes, colons, backslashes, tabs, and anything else the host OS believes are OK for filenames.

e.g. sed '/uploaded=false/ d; s/.*file=\(.*\)/\1/' | xargs rm -r

Trigger should be something easy for whoever's creating files in the given directory (in a 'you created it, you delete it' sense), and also for something scheduled in the same container as filebeat - one or more of:

  • Running filebeat with a --status /target/file option to trigger the existing filebeat to write to the nominated file
  • kill -USR1 or touching the config file to write to a predefined file
  • creating a new file matching some pattern and world-writeable, to be populated by filebeat.
  • something better

Describe a specific use case for the enhancement or feature:

I understand #714 to mean that Filebeat isn't going to delete files, but since it's the only source of truth about what has and hasn't been uploaded, it should make that information readily available in a format that doesn't smell like going through someone else's secrets.

  1. I write logs to files with a date+hour based filename, and want to delete all but the latest one (to which data is still being periodically appended) once Filebeat has uploaded them, but I don't want to parse the registry myself and potentially introduce bugs in my date analysis code or whatever.

  2. During a CI test, I create fully formed log files in a directory (copied from the CI's work area) and want to delete any files from this or previous builds after they've been sent, again without writing potentially buggy code to parse the registry.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 9, 2025
@botelastic
Copy link

botelastic bot commented Jan 9, 2025

This issue doesn't have a Team:<team> label.

@sleekweasel
Copy link
Author

To reinforce this issue: I took heed of the documentation that says 'log is deprecated, use filestream instead' and migrated my config.

So now the registry's changed format and I have to puzzle my way through jq syntax to make it read both and hope my logic works.

If filebeat is supposed to be a reliable mechanism, it's failing at that when it doesn't provide a closed loop to make it clear when a file has been transferred - instead relying on the hope that 3 hours is enough or people parsing some random transaction file with no official documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_team Indicates that the issue/PR needs a Team:* label
Projects
None yet
Development

No branches or pull requests

1 participant