Quantise report cache in query side of aws-collector #3671

bboreham · 2019-09-09T10:04:06Z

Cache merged groups of reports, to reduce the number we handle in parallel.

Previously we would merge all reports in a 15-second window. Now we use a 'quantum' of 3 seconds, similar to the single-user app.

E.g. a 30-node cluster will have 150 individual reports over 15 seconds, but the new code will merge 5 pre-merged reports plus 20-ish very recent individual ones.

This reduces the max heap size used for deserialising, since we only do 3 seconds at once per instance. This improves #3256, but I'm not marking it as a fix since the number done in parallel is still unlimited.

Individual reports are still put into the cache, but should get displaced by the pre-merged ones under LRU.

…allel Previously we would merge all reports in a 15-second window. Now we use a 'quantum' of 3 seconds, similar to the single-user app. E.g. a 30-node cluster will have 150 individual reports over 15 seconds, but the new code will merge 5 pre-merged reports plus 20-ish very recent individual ones. This limits the max heap size used for deserialising, since we only do 3 seconds at once per instance. Individual reports are still put into the cache, but should get displaced by the pre-merged ones under LRU.

rade · 2019-09-12T12:52:37Z

app/multitenant/aws_collector.go

+	reportQuantisationInterval = 3000000000 // 3 seconds in nanoseconds
+	// Grace period allows for some gap between the timestamp on reports
+	// (assigned when they arrive at collector) and them appearing in DynamoDB query
+	gracePeriod = 500000000 // 1/2 second in nanoseconds


Why are these not 3 * time.Second and 500 * time.Millisecond, respectively?

Because timestamps are stored in the database as a number of nanoseconds since the epoch, manipulated in the program as int64 type.

I could easily make these constants time.Duration and call .Nanoseconds() on every use, somewhat harder to convert all the timestamps into time.Time and get the modulus arithmetic correct.

I did "make these constants time.Duration and call .Nanoseconds() on every use"

rade · 2019-09-12T13:02:44Z

app/multitenant/aws_collector.go

+	var reports []report.Report
+	// Fetch a merged report for each time quantum covering the window
+	startTS, endTS := start.UnixNano(), end.UnixNano()
+	ts := startTS - (startTS % reportQuantisationInterval)


This will include reports up to reportQuantisationInterval earlier than startTS. It seems odd to 'run over' like that at the start of the interval but not the end (where we stop short and then fetch individual reports).

Yes, it extends the window up to 3 seconds into the past. This is assumed to be not noticeable.
We stop short at the near end of the window on the assumption that reports in that range could still arrive in the database.

rade · 2019-09-13T07:57:46Z

app/multitenant/aws_collector.go

+			return report.MakeReport(), err
+		}
+		reports = append(reports, quantumReport)
+	}


I was wondering whether that entire loop could operate on time.Time, converting to unix nanos only in the call to (or even inside) reportForQuantum.

It’s the modulus that scares me.
I’ve made one bug that way already this year.

bboreham added 3 commits September 8, 2019 12:27

Refactor: pull time interval computation up out of getReportKeys()

589c4c4

Refactor: pull userid fetch up out of getReportKeys()

70550ca

bboreham requested a review from qiell as a code owner September 9, 2019 10:04

bboreham changed the title ~~Quantise report cache in aws-collector~~ Quantise report cache in query side of aws-collector Sep 11, 2019

rade reviewed Sep 12, 2019

View reviewed changes

Use time.Duration instead of nanoseconds for constants

74b6a29

rade reviewed Sep 13, 2019

View reviewed changes

rade approved these changes Sep 13, 2019

View reviewed changes

bboreham merged commit d9cb838 into master Sep 13, 2019

bboreham deleted the quantise-aws-read-cache branch September 13, 2019 15:32

bboreham mentioned this pull request Jun 12, 2020

Limit the number of report decodings that we do in parallel, in multitenant code #3256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantise report cache in query side of aws-collector #3671

Quantise report cache in query side of aws-collector #3671

bboreham commented Sep 9, 2019 •

edited

Loading

rade Sep 12, 2019 •

edited

Loading

bboreham Sep 12, 2019

bboreham Sep 13, 2019

rade Sep 12, 2019

bboreham Sep 12, 2019

rade Sep 13, 2019

bboreham Sep 13, 2019

Quantise report cache in query side of aws-collector #3671

Quantise report cache in query side of aws-collector #3671

Conversation

bboreham commented Sep 9, 2019 • edited Loading

rade Sep 12, 2019 • edited Loading

Choose a reason for hiding this comment

bboreham Sep 12, 2019

Choose a reason for hiding this comment

bboreham Sep 13, 2019

Choose a reason for hiding this comment

rade Sep 12, 2019

Choose a reason for hiding this comment

bboreham Sep 12, 2019

Choose a reason for hiding this comment

rade Sep 13, 2019

Choose a reason for hiding this comment

bboreham Sep 13, 2019

Choose a reason for hiding this comment

bboreham commented Sep 9, 2019 •

edited

Loading

rade Sep 12, 2019 •

edited

Loading