-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For long running connections, place files in current day #119
Conversation
Pull Request Test Coverage Report for Build 881
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained
saver/saver.go, line 114 at r1 (raw file):
Quoted 4 lines of code…
// Note that long running connections will have data in multiple directories, // and dates in later filenames will not match directory. // Prior to April 2020, the files were placed in date corresponding to // connection's start time.
This is a small but significant change.
I found this hard to grok on a single pass.
How about:
Note: Prior to April 2020, files for the lifetime of a connection were placed in a date-directory
corresponding to the connection's start time. Now, long running connections may have data in
multiple directories. The date-directory containing later files will not match the start date of the
connection.
saver/saver.go, line 119 at r1 (raw file):
func (conn *Connection) Rotate(Host string, Pod string, FileAgeLimit time.Duration) error { datePath := conn.StartTime.Format("2006/01/02") // For long running connections, later blocks may
incomplete sentence?
saver/saver.go, line 122 at r1 (raw file):
if conn.Sequence > 0 { now := time.Now().UTC() datePath = now.Format("2006/01/02")
Does this block need to be conditional? If so, please add a comment.
saver/saver.go, line 337 at r1 (raw file):
// TODO - we only need to collect these stats if this is a reporting cycle. // NOTE: Prior to April 2020, we were not using UTC here. s4, r4 := svr.handleType(msgs.V4Time.UTC(), msgs.V4Messages)
This is a good policy. Our nodes should be set to UTC time. So, I hope this is a no-effective-change change. If you've seen evidence that doesn't match nodes-run-utc time, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording suggestions, but the change itself looks good to me. When Stephen approves, so do I.
Reviewable status: 0 of 1 LGTMs obtained
saver/saver.go, line 113 at r1 (raw file):
// Rotate opens the next writer for a connection. // Note that long running connections will have data in multiple directories,
because, for all segments after the first one, we choose the directory based on the time Rotate() was called, and not on the StartTime of the connection. Long-running connections with data on multiple days will therefore likely have data in multiple date directories. (This behavior is new as of April 2020. Prior to then, all files were placed in the directory corresponding to the StartTime.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained
saver/saver.go, line 113 at r1 (raw file):
Previously, pboothe (Peter Boothe) wrote…
because, for all segments after the first one, we choose the directory based on the time Rotate() was called, and not on the StartTime of the connection. Long-running connections with data on multiple days will therefore likely have data in multiple date directories. (This behavior is new as of April 2020. Prior to then, all files were placed in the directory corresponding to the StartTime.)
Using this language.
saver/saver.go, line 114 at r1 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
// Note that long running connections will have data in multiple directories, // and dates in later filenames will not match directory. // Prior to April 2020, the files were placed in date corresponding to // connection's start time.
This is a small but significant change.
I found this hard to grok on a single pass.
How about:
Note: Prior to April 2020, files for the lifetime of a connection were placed in a date-directory
corresponding to the connection's start time. Now, long running connections may have data in
multiple directories. The date-directory containing later files will not match the start date of the
connection.
Using peter's language.
saver/saver.go, line 119 at r1 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
incomplete sentence?
Done.
saver/saver.go, line 122 at r1 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Does this block need to be conditional? If so, please add a comment.
Done.
saver/saver.go, line 337 at r1 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
This is a good policy. Our nodes should be set to UTC time. So, I hope this is a no-effective-change change. If you've seen evidence that doesn't match nodes-run-utc time, please let me know.
Ah - that explains it. I was wondering why this wasn't causing problems. I'm updating the comment though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptal
Reviewable status: 0 of 1 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 LGTMs obtained
This changes the file sequence behavior to place each file in the date that the segment started in, instead of the date the connection started. This simplifies parsing and deduping, and makes it easier to find all connections for a given time period.
Note that the first 10 minutes after midnight (UTC) may have some connection segments that will appear in the previous day's data. But this is much better than before.
This change is