Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantise report cache in query side of aws-collector #3671

Merged
merged 4 commits into from
Sep 13, 2019

Conversation

bboreham
Copy link
Collaborator

@bboreham bboreham commented Sep 9, 2019

Cache merged groups of reports, to reduce the number we handle in parallel.

Previously we would merge all reports in a 15-second window. Now we use a 'quantum' of 3 seconds, similar to the single-user app.

E.g. a 30-node cluster will have 150 individual reports over 15 seconds, but the new code will merge 5 pre-merged reports plus 20-ish very recent individual ones.

This reduces the max heap size used for deserialising, since we only do 3 seconds at once per instance. This improves #3256, but I'm not marking it as a fix since the number done in parallel is still unlimited.

Individual reports are still put into the cache, but should get displaced by the pre-merged ones under LRU.

…allel

Previously we would merge all reports in a 15-second window.
Now we use a 'quantum' of 3 seconds, similar to the single-user app.

E.g. a 30-node cluster will have 150 individual reports over 15
seconds, but the new code will merge 5 pre-merged reports plus 20-ish
very recent individual ones.

This limits the max heap size used for deserialising, since we only do
3 seconds at once per instance.

Individual reports are still put into the cache, but should get
displaced by the pre-merged ones under LRU.
@bboreham bboreham requested a review from qiell as a code owner September 9, 2019 10:04
@bboreham bboreham changed the title Quantise report cache in aws-collector Quantise report cache in query side of aws-collector Sep 11, 2019
reportQuantisationInterval = 3000000000 // 3 seconds in nanoseconds
// Grace period allows for some gap between the timestamp on reports
// (assigned when they arrive at collector) and them appearing in DynamoDB query
gracePeriod = 500000000 // 1/2 second in nanoseconds
Copy link
Member

@rade rade Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these not 3 * time.Second and 500 * time.Millisecond, respectively?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because timestamps are stored in the database as a number of nanoseconds since the epoch, manipulated in the program as int64 type.

I could easily make these constants time.Duration and call .Nanoseconds() on every use, somewhat harder to convert all the timestamps into time.Time and get the modulus arithmetic correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did "make these constants time.Duration and call .Nanoseconds() on every use"

var reports []report.Report
// Fetch a merged report for each time quantum covering the window
startTS, endTS := start.UnixNano(), end.UnixNano()
ts := startTS - (startTS % reportQuantisationInterval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will include reports up to reportQuantisationInterval earlier than startTS. It seems odd to 'run over' like that at the start of the interval but not the end (where we stop short and then fetch individual reports).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it extends the window up to 3 seconds into the past. This is assumed to be not noticeable.
We stop short at the near end of the window on the assumption that reports in that range could still arrive in the database.

return report.MakeReport(), err
}
reports = append(reports, quantumReport)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether that entire loop could operate on time.Time, converting to unix nanos only in the call to (or even inside) reportForQuantum.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s the modulus that scares me.
I’ve made one bug that way already this year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants