Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore][fileconsumer/archive] - add signature methods for archive feature #35098

Merged

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Sep 10, 2024

Description:

This PR introduces the archive method to store files older than 3 poll cycles.
The core logic for reading will be introduced in a follow up PR.

Link to tracking Issue: #32727

Testing: To be included in future PRs

Documentation: To be included in future PRs

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Sep 10, 2024

@djaglowski This is a draft PR which separates the reader creation and archiving module as per #32727 (comment)

let me know which looks good.

@VihasMakwana VihasMakwana force-pushed the create-archive-storage-temp branch 3 times, most recently from 7496847 to a12737f Compare September 10, 2024 06:12

type FileRecord struct {
File *os.File
Fingerprint *fingerprint.Fingerprint
Copy link
Contributor Author

@VihasMakwana VihasMakwana Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaglowski storing file handle is crucial for us because of the way we handle the archiving:

  1. We try to match files against knownFiles array.
  2. If a match isn't found, we would accumulate the unmatched files (instead of finding in the archive straightaway). More about it was discussed here.
  3. Perform the archiving logic. Described here.

We need the file handle so that we can create a reader from it after the archiving is done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make any sense to me. See my comments on the other PR. I think this is a fundamental point that needs to be resolved before bothering with the rest of the PR.

reader.Metadata is sufficient and necessary for creating readers in all other circumstances except where we already have an open file handle, in which case we can do better.

I do not understand why we would or even could have open file handles at this point, since the archive should only be populated by elements which are discarded from knownFiles, which only contains reader.Metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaglowski Thank you for the feedback. I realise now that I might have made things too complex with the initial PR, and we don’t actually need to keep files open while archiving.

@VihasMakwana VihasMakwana force-pushed the create-archive-storage-temp branch 4 times, most recently from 3364229 to b4190cf Compare September 13, 2024 17:43
@VihasMakwana VihasMakwana changed the title [fileconsumer/archive] - add signature methods for archive module and enable default [fileconsumer/archive] - add signature methods for archive module Sep 13, 2024
@VihasMakwana VihasMakwana marked this pull request as ready for review September 13, 2024 18:08
@VihasMakwana VihasMakwana requested a review from a team September 13, 2024 18:08
@VihasMakwana
Copy link
Contributor Author

@djaglowski please take a look at this PR. Appreciate your time reviewing this!


Addressing your points below:

What does storing the file handle give us?

As I said, I made things too complex with the initial PR, and we don’t actually need to keep files open while archiving.

Are we sure archive should be separate from tracker? Seems very closely related and kind of awkward to manage both.

Agree. Keeping everything in tracker would make it more maintainable.

reader.Metadata is sufficient and necessary for creating readers in all other circumstances except where we already have an open file handle, in which case we can do better.

This makes sense to me.

@VihasMakwana VihasMakwana force-pushed the create-archive-storage-temp branch from b4190cf to 352e8e4 Compare September 13, 2024 18:18
@VihasMakwana VihasMakwana force-pushed the create-archive-storage-temp branch from 352e8e4 to 077ab35 Compare September 13, 2024 18:22
@VihasMakwana
Copy link
Contributor Author

@djaglowski Thanks for your time in reviewing this PR!
Once this gets merged, I'll roll out another PR with reading from archive logic.
It'll also contain the test cases for this feature.

We can ofcourse break it down in multiple PRs if necessary.

pkg/stanza/fileconsumer/file.go Show resolved Hide resolved
Comment on lines +164 to +166
if t.pollsToArchive <= 0 || t.persister == nil {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about these cases? Aren't we checking them and instantiating a NoStateTracker when these would ever be relevant?

Copy link
Contributor Author

@VihasMakwana VihasMakwana Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoStateTracker will be created when WithNoTracking option is selected.
We can create a normal tracker with archiving disabled, by setting pollsToArchive: 0.
A tracker with archiving enabled will have pollsToArchive > 0.

The check t.persister == nil is just a precautionary check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Maybe in another PR we can clean this up. IMO we have starting to have too many special cases to handle and the easiest solution to that is move to a pattern where the nils pieces are nops instead. Then the implementation doesn't need to worry about details so much.

@VihasMakwana VihasMakwana force-pushed the create-archive-storage-temp branch 6 times, most recently from 33920e0 to 080bf7a Compare October 9, 2024 06:44
@VihasMakwana VihasMakwana force-pushed the create-archive-storage-temp branch from 080bf7a to 27cc4e1 Compare October 9, 2024 06:44
@djaglowski djaglowski merged commit 19ddd21 into open-telemetry:main Oct 9, 2024
155 checks passed
@github-actions github-actions bot added this to the next release milestone Oct 9, 2024
jmichalek132 pushed a commit to jmichalek132/opentelemetry-collector-contrib that referenced this pull request Oct 10, 2024
…ture (open-telemetry#35098)

Description:

This PR introduces the `archive` method to store files older than 3 poll
cycles.
The core logic for reading will be introduced in a follow up PR.


Link to tracking Issue:
open-telemetry#32727

Testing: To be included in future PRs

Documentation: To be included in future PRs
djaglowski pushed a commit that referenced this pull request Dec 9, 2024
This PR follows
#35098.

### Description

- This PR adds core logic for matching from archive. Check [this
out](#32727 (comment))
for the core logic.

### Future PRs

- As of now, we don't keep track of most recently written index across
collector restarts. This is simple to accomplish and we can use of
persister for this. I haven't implemented this in current PR, as I want
to guide your focus solely towards reading part. We can address this in
this PR (later) or in a separate PR, independently.
- Testing and Enabling: Once we establish common ground on _**reading
from archive**_ matter, we can proceed with testing and enabling the
configuration.
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…ture (open-telemetry#35098)

Description:

This PR introduces the `archive` method to store files older than 3 poll
cycles.
The core logic for reading will be introduced in a follow up PR.


Link to tracking Issue:
open-telemetry#32727

Testing: To be included in future PRs

Documentation: To be included in future PRs
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…y#35798)

This PR follows
open-telemetry#35098.

### Description

- This PR adds core logic for matching from archive. Check [this
out](open-telemetry#32727 (comment))
for the core logic.

### Future PRs

- As of now, we don't keep track of most recently written index across
collector restarts. This is simple to accomplish and we can use of
persister for this. I haven't implemented this in current PR, as I want
to guide your focus solely towards reading part. We can address this in
this PR (later) or in a separate PR, independently.
- Testing and Enabling: Once we establish common ground on _**reading
from archive**_ matter, we can proceed with testing and enabling the
configuration.
AkhigbeEromo pushed a commit to sematext/opentelemetry-collector-contrib that referenced this pull request Jan 13, 2025
…y#35798)

This PR follows
open-telemetry#35098.

### Description

- This PR adds core logic for matching from archive. Check [this
out](open-telemetry#32727 (comment))
for the core logic.

### Future PRs

- As of now, we don't keep track of most recently written index across
collector restarts. This is simple to accomplish and we can use of
persister for this. I haven't implemented this in current PR, as I want
to guide your focus solely towards reading part. We can address this in
this PR (later) or in a separate PR, independently.
- Testing and Enabling: Once we establish common ground on _**reading
from archive**_ matter, we can proceed with testing and enabling the
configuration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants