Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For long running connections, place files in current day #119

Merged
merged 2 commits into from
Apr 2, 2020

Conversation

gfr10598
Copy link
Contributor

@gfr10598 gfr10598 commented Apr 2, 2020

This changes the file sequence behavior to place each file in the date that the segment started in, instead of the date the connection started. This simplifies parsing and deduping, and makes it easier to find all connections for a given time period.

Note that the first 10 minutes after midnight (UTC) may have some connection segments that will appear in the previous day's data. But this is much better than before.


This change is Reviewable

@coveralls
Copy link

coveralls commented Apr 2, 2020

Pull Request Test Coverage Report for Build 881

  • 7 of 10 (70.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.1%) to 83.243%

Changes Missing Coverage Covered Lines Changed/Added Lines %
saver/saver.go 7 10 70.0%
Totals Coverage Status
Change from base Build 876: -0.1%
Covered Lines: 1078
Relevant Lines: 1295

💛 - Coveralls

Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained


saver/saver.go, line 114 at r1 (raw file):

Quoted 4 lines of code…
// Note that long running connections will have data in multiple directories,
// and dates in later filenames will not match directory.
// Prior to April 2020, the files were placed in date corresponding to
// connection's start time.

This is a small but significant change.

I found this hard to grok on a single pass.

How about:

Note: Prior to April 2020, files for the lifetime of a connection were placed in a date-directory
corresponding to the connection's start time. Now, long running connections may have data in
multiple directories. The date-directory containing later files will not match the start date of the
connection.


saver/saver.go, line 119 at r1 (raw file):

func (conn *Connection) Rotate(Host string, Pod string, FileAgeLimit time.Duration) error {
	datePath := conn.StartTime.Format("2006/01/02")
	// For long running connections, later blocks may

incomplete sentence?


saver/saver.go, line 122 at r1 (raw file):

	if conn.Sequence > 0 {
		now := time.Now().UTC()
		datePath = now.Format("2006/01/02")

Does this block need to be conditional? If so, please add a comment.


saver/saver.go, line 337 at r1 (raw file):

		// TODO - we only need to collect these stats if this is a reporting cycle.
		// NOTE: Prior to April 2020, we were not using UTC here.
		s4, r4 := svr.handleType(msgs.V4Time.UTC(), msgs.V4Messages)

This is a good policy. Our nodes should be set to UTC time. So, I hope this is a no-effective-change change. If you've seen evidence that doesn't match nodes-run-utc time, please let me know.

Copy link
Contributor

@pboothe pboothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording suggestions, but the change itself looks good to me. When Stephen approves, so do I.

Reviewable status: 0 of 1 LGTMs obtained


saver/saver.go, line 113 at r1 (raw file):

// Rotate opens the next writer for a connection.
// Note that long running connections will have data in multiple directories,

because, for all segments after the first one, we choose the directory based on the time Rotate() was called, and not on the StartTime of the connection. Long-running connections with data on multiple days will therefore likely have data in multiple date directories. (This behavior is new as of April 2020. Prior to then, all files were placed in the directory corresponding to the StartTime.)

Copy link
Contributor Author

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained


saver/saver.go, line 113 at r1 (raw file):

Previously, pboothe (Peter Boothe) wrote…

because, for all segments after the first one, we choose the directory based on the time Rotate() was called, and not on the StartTime of the connection. Long-running connections with data on multiple days will therefore likely have data in multiple date directories. (This behavior is new as of April 2020. Prior to then, all files were placed in the directory corresponding to the StartTime.)

Using this language.


saver/saver.go, line 114 at r1 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…
// Note that long running connections will have data in multiple directories,
// and dates in later filenames will not match directory.
// Prior to April 2020, the files were placed in date corresponding to
// connection's start time.

This is a small but significant change.

I found this hard to grok on a single pass.

How about:

Note: Prior to April 2020, files for the lifetime of a connection were placed in a date-directory
corresponding to the connection's start time. Now, long running connections may have data in
multiple directories. The date-directory containing later files will not match the start date of the
connection.

Using peter's language.


saver/saver.go, line 119 at r1 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

incomplete sentence?

Done.


saver/saver.go, line 122 at r1 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Does this block need to be conditional? If so, please add a comment.

Done.


saver/saver.go, line 337 at r1 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

This is a good policy. Our nodes should be set to UTC time. So, I hope this is a no-effective-change change. If you've seen evidence that doesn't match nodes-run-utc time, please let me know.

Ah - that explains it. I was wondering why this wasn't causing problems. I'm updating the comment though.

Copy link
Contributor Author

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal

Reviewable status: 0 of 1 LGTMs obtained

Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained

@gfr10598 gfr10598 merged commit 2fe3e86 into master Apr 2, 2020
@gfr10598 gfr10598 deleted the long-running branch April 2, 2020 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants