sql: add support for transaction level stats collection #52704

arulajmani · 2020-08-12T17:12:12Z

This patch adds the framework to start collecting transaction level
statistics and return them on the statements end point. Previously,
the response on the end point was being faked. This patch replaces the
fake response by an actual response with some fields correctly
populated. Other fields will be populated in upcoming patches, and
currently contain garbage data.

The correctly populated fields on CollectedTransactionStatistics are:

StatementIDs
Count
ServiceLat
App
RetryLat
MaxRetries
CommitLat

Release note: None

cockroach-teamcity · 2020-08-12T17:12:20Z

This change is

solongordon

Looks like a good start! I want to be sensitive to the fact that transactions might have a very large number of statements in them in pathological cases. (Not even especially pathological. Netflix cited a real example where a user had a transaction with 100,000 insert statements in it. This is not advisable but also a realistic thing that users might do.)

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/server/statements.go, line 90 at r2 (raw file):

	// TODO(arul): This assigning IDs should probably happen when we store
	//  the statements in the statsCollector or when we get them out of the
	//  stats collector.

Agree. We need to ensure that these IDs match the ones in the transaction stats, so ideally we would only generate this hash once and use it in both places.

pkg/sql/app_stats.go, line 48 at r2 (raw file):

// txnKey is the concatenation of all stmtKey.id's for all the statements in the
// transaction
type txnKey string

I'm a bit concerned about the potential for giant strings here. (Imagine the case where a transaction has 100k statements in it.) What if we made this a hash of all the statement fingerprints?

pkg/sql/conn_executor.go, line 1016 at r2 (raw file):

		// transactionStatements tracks all statements that make up the current
		// transaction.
		transactionStatements []*Statement

Is it possible we already have this data somewhere? It feels like it must be necessary to support transaction retries.

If we do need to add this, it seems like all you really need is the statement ID, so holding a pointer to the entire statement feels excessive.

arulajmani

Still have one outstanding comment left to address and a couple of TODOs in the code, but I think this is in a place where I can un-WIP this to get some more feedback.

Also left an inline comment about the structure of the response I'd like clarification on.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)

pkg/server/statements.go, line 90 at r2 (raw file):

Previously, solongordon (Solon) wrote…

Agree. We need to ensure that these IDs match the ones in the transaction stats, so ideally we would only generate this hash once and use it in both places.

Is there a reason we have these ID field on roachpb.StatementStatisticsKey instead of roachpb.StatementStatistics? I think it would be better to have the ID on the value (roachpb.StatementStatistics) because ID is a derived field and not part of the key used to bucket statements.

Also I'm not completely sure why there is a need to organize this stuff as CollectedStatementStatistics which contains StatementStatisticsKey and StatementStatistics. Whatever the reason may be, I wonder if that will apply to transaction statistics as well. Do we need a CollectedTransactionStatistics which contains a TransactionStatisticsKey and TransactionStatistics? Currently we only have a flat TransactionStatistics structure in the response.

pkg/sql/app_stats.go, line 48 at r2 (raw file):

Previously, solongordon (Solon) wrote…

I'm a bit concerned about the potential for giant strings here. (Imagine the case where a transaction has 100k statements in it.) What if we made this a hash of all the statement fingerprints?

Done.

solongordon

Reviewed 3 of 19 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)

pkg/roachpb/app_stats.proto, line 121 at r4 (raw file):

  // CommitLat is the amount of time required to commit the transaction after
  // all statement operations have been applied.
  optional NumericStat commit_lat = 8 [(gogoproto.nullable) = false];

Seems like this isn't populated yet, thought fine with me to save that for a follow-up PR.

pkg/server/statements.go, line 90 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Is there a reason we have these ID field on roachpb.StatementStatisticsKey instead of roachpb.StatementStatistics? I think it would be better to have the ID on the value (roachpb.StatementStatistics) because ID is a derived field and not part of the key used to bucket statements.

Also I'm not completely sure why there is a need to organize this stuff as CollectedStatementStatistics which contains StatementStatisticsKey and StatementStatistics. Whatever the reason may be, I wonder if that will apply to transaction statistics as well. Do we need a CollectedTransactionStatistics which contains a TransactionStatisticsKey and TransactionStatistics? Currently we only have a flat TransactionStatistics structure in the response.

I see what you mean about having ID in the Key but putting it in the value feels weird to me too. What about adding it to CollectedStatementStatistics?

I would guess the reason for CollectedStatementStatistics is just that protobuf doesn't have a map datatype, so someone created that type to represent the key-value pairs. We might as well mimic that for transactions to keep things analogous I suppose.

pkg/sql/app_stats.go, line 48 at r4 (raw file):

// txnKey is the concatenation of all stmtKey.id's for all the statements in the
// transaction

Out of date comment

pkg/sql/app_stats.go, line 77 at r4 (raw file):

	mu struct {
		syncutil.Mutex
		// TODO(arul): Can we rename this without breaking stuff?

What specifically do you want to rename? I think you can rename most anything except protobuf rpc names.

pkg/sql/app_stats.go, line 504 at r4 (raw file):

}

func constructStatementID(queryFingerprint string) string {

As you pointed out, it seems like we need to take the entire statement key into account here to avoid conflicts.

pkg/sql/backfill.go, line 150 at r4 (raw file):

// This operates over multiple goroutines concurrently and is thus not
// able to reuse the original kv.Txn safely. The various
// function that it calls make their own txnCounts.

I think you had a rename go rogue here. Lots of unintended txn -> txnCounts replacements.

pkg/sql/conn_executor.go, line 1016 at r2 (raw file):

Previously, solongordon (Solon) wrote…

Is it possible we already have this data somewhere? It feels like it must be necessary to support transaction retries.

If we do need to add this, it seems like all you really need is the statement ID, so holding a pointer to the entire statement feels excessive.

This comment still stands.

solongordon

By the way, you should run make bin/protoc-gen-gogoroach to avoid the churn in mvcc3.pb.go and timestamp.pb.go due to the Go 1.13 downgrade.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

arulajmani

Made some more changes on the general structure and addressed all outstanding comments. Should be RFAL.

There's a few fields left unpopulated for now, I'll either update this branch or do them as follow-ups to this PR, depending on how soon we are able to merge this big change!

Reviewed 1 of 23 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @solongordon)

pkg/roachpb/app_stats.proto, line 121 at r4 (raw file):

Previously, solongordon (Solon) wrote…

Seems like this isn't populated yet, thought fine with me to save that for a follow-up PR.

Realized I had the wrong list of fields populated in my commit desc, fixed that now! I'll try and push this (and anything else left) on this branch, but if this gets approved before Ill save it for a follow-up.

pkg/server/statements.go, line 90 at r2 (raw file):

Previously, solongordon (Solon) wrote…

I see what you mean about having ID in the Key but putting it in the value feels weird to me too. What about adding it to CollectedStatementStatistics?

I would guess the reason for CollectedStatementStatistics is just that protobuf doesn't have a map datatype, so someone created that type to represent the key-value pairs. We might as well mimic that for transactions to keep things analogous I suppose.

I'm adding this to CollectedStatementStatistics for now. How do you feel about de-coupling this proto structure from this internal map (appStats.stmts) that we have? I don't see a good reason to have it this stuff be mirrored.

We should still have CollectedStatementStatistics which contains StatementStatistics inside it, but we should get rid of StatementStatisticsKey. All the fields on the key right now should be added to CollectedStatementStatistics as flat fields, as they are immutable (including the StatementID).

The reason I'm proposing this de-coupling is because I want to change the appStats.stmt map's key. Ideally, we should use the statement ID as the key, as it is a hash of all the fields currently in the stmtKey. Right now, I'm storing the Statement ID as a value (unprotected by a mutex, as it is only set on creation). Ideally, I'd flip this around -- the fields on stmtKey would be stored as immutable fields and the Statement ID would serve as the key. By making this change, we guard ourselves against future code that wants to add/remove axis' for statement bucketing. Adding a new axis will require making the change to how Statement ID is constructed, which will preserve the 1:1 mapping between an ID recorded by a txn and IDs corresponding to statements. Probably won't do this as part of this PR, as it's slightly involved, but I'm in favour of decoupling the map and protobuf here though. Lemme know what you think!

pkg/sql/app_stats.go, line 48 at r4 (raw file):

Previously, solongordon (Solon) wrote…

Out of date comment

Done.

pkg/sql/app_stats.go, line 77 at r4 (raw file):

Previously, solongordon (Solon) wrote…

What specifically do you want to rename? I think you can rename most anything except protobuf rpc names.

I was thinking of renaming this to TxnCounts to make make it in line with the enclosing struct.

I'm open to suggestions on how we can rename this (and the enclosing struct) now that we have these new transaction statistics. transactionCounts is sort of correct but not really, This struct captures very coarse grained transaction stats, like the number transactions started, number of transactions committed, number of transactions that were implicit, and the average time for all transactions.

pkg/sql/app_stats.go, line 504 at r4 (raw file):

Previously, solongordon (Solon) wrote…

As you pointed out, it seems like we need to take the entire statement key into account here to avoid conflicts.

Done.

pkg/sql/backfill.go, line 150 at r4 (raw file):

Previously, solongordon (Solon) wrote…

I think you had a rename go rogue here. Lots of unintended txn -> txnCounts replacements.

:( :( reverted, hopefully I didn't miss any.

pkg/sql/conn_executor.go, line 1016 at r2 (raw file):

Previously, solongordon (Solon) wrote…

This comment still stands.

Looks like we have a this txnRewindPos but it's only defined when the txn is open. This txnRewindPos refers to the position in the buffer. Maybe we could use it, but we'd have to extend it to ensure the buffer isn't cleared when we reach the transaction recording stuff. Seems like quite a lot of effort for too little reward, so I'm taking your second recommendation of recording just the statement IDs. This way, we only calculate this statement ID stuff once, which I quite like.

To support adding transaction-level statistics to the webui, I modified the existing StatementsResponse to include a new `transactions` field. Transactions here are defined by the collection of statement fingerprints within. To allow referencing statements, I added a statement ID field, which is a hash of the statement fingerprint. For the moment, transactions in the response are faked out by combining together statement IDs at random. This is meant to unblock development on the frontend while we work on populating the response with the real transaction data. Release note: None

arulajmani · 2020-08-19T22:44:38Z

Rebased master to get rid of some merge conflicts.

arulajmani · 2020-08-20T04:02:23Z

Added tracking for commit latency as well. The test I wrote is failing on stress though, I'll debug that tomorrow

solongordon

Reviewed 5 of 14 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)

pkg/roachpb/app_stats.proto, line 158 at r7 (raw file):

message CollectedStatementStatistics {
  // ID is a hash of the statement key (query fingerprint, vectorized or not,
  // implicit txn or not, etc) which can be used to identify the statement

Nit: Turns out vectorized is not considered here, right?

pkg/server/statements.go, line 90 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I'm adding this to CollectedStatementStatistics for now. How do you feel about de-coupling this proto structure from this internal map (appStats.stmts) that we have? I don't see a good reason to have it this stuff be mirrored.

We should still have CollectedStatementStatistics which contains StatementStatistics inside it, but we should get rid of StatementStatisticsKey. All the fields on the key right now should be added to CollectedStatementStatistics as flat fields, as they are immutable (including the StatementID).

The reason I'm proposing this de-coupling is because I want to change the appStats.stmt map's key. Ideally, we should use the statement ID as the key, as it is a hash of all the fields currently in the stmtKey. Right now, I'm storing the Statement ID as a value (unprotected by a mutex, as it is only set on creation). Ideally, I'd flip this around -- the fields on stmtKey would be stored as immutable fields and the Statement ID would serve as the key. By making this change, we guard ourselves against future code that wants to add/remove axis' for statement bucketing. Adding a new axis will require making the change to how Statement ID is constructed, which will preserve the 1:1 mapping between an ID recorded by a txn and IDs corresponding to statements. Probably won't do this as part of this PR, as it's slightly involved, but I'm in favour of decoupling the map and protobuf here though. Lemme know what you think!

I'm fine with them being de-coupled though I agree we shouldn't change this as part of this PR. Let's revisit whether it makes sense to use the hash as a key here or not.

pkg/server/status_test.go, line 1615 at r7 (raw file):

	sort.Ints(expectedCounts)

	if !reflect.DeepEqual(respCounts, expectedCounts) {

I wonder if you can find a way to make this test a bit more robust and debuggable so you're not just comparing a list of counts. A random idea I had was that when you're running the queries, you could set a different application name for each transaction fingerprint to make it easier to distinguish them when verifying the results. Maybe you can come up with something better.

pkg/sql/app_stats.go, line 77 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I was thinking of renaming this to TxnCounts to make make it in line with the enclosing struct.

I'm open to suggestions on how we can rename this (and the enclosing struct) now that we have these new transaction statistics. transactionCounts is sort of correct but not really, This struct captures very coarse grained transaction stats, like the number transactions started, number of transactions committed, number of transactions that were implicit, and the average time for all transactions.

That makes sense. Let's come back to it in a subsequent PR.

pkg/sql/app_stats.go, line 169 at r7 (raw file):

	createIfNonExistent := true
	// If the statement is below the latency threshold, we don't need to create
	// an entry in the stmts map for it.

Hmm, this is interesting. It seems like if this setting is enabled, we may have statement IDs which don't have a corresponding entry in the statement statistics. That could cause trouble in the UI. Don't worry about addressing that in this PR but I think we'll have to revisit this and think about a solution.

pkg/sql/app_stats.go, line 662 at r7 (raw file):

			distSQLUsed := stats.mu.distSQLUsed
			vectorized := stats.mu.vectorized
			stats.mu.Unlock()

Why did this change?

pkg/sql/conn_executor.go, line 1016 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Looks like we have a this txnRewindPos but it's only defined when the txn is open. This txnRewindPos refers to the position in the buffer. Maybe we could use it, but we'd have to extend it to ensure the buffer isn't cleared when we reach the transaction recording stuff. Seems like quite a lot of effort for too little reward, so I'm taking your second recommendation of recording just the statement IDs. This way, we only calculate this statement ID stuff once, which I quite like.

Works for me!

pkg/sql/conn_executor_exec.go, line 1431 at r7 (raw file):

// recordTransactionStart records the start of the transaction and returns
// closures to be called once the transaction finishes or if they transaction

s/they/the/

arulajmani

Updated the test, addressed all comments, and fixed a data race caught by CI. Should be RFAL!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @solongordon)

pkg/roachpb/app_stats.proto, line 158 at r7 (raw file):

Previously, solongordon (Solon) wrote…

Nit: Turns out vectorized is not considered here, right?

Yeah, this changed when I rebased! Fixed

pkg/server/statements.go, line 90 at r2 (raw file):

Previously, solongordon (Solon) wrote…

I'm fine with them being de-coupled though I agree we shouldn't change this as part of this PR. Let's revisit whether it makes sense to use the hash as a key here or not.

Makes sense. I've left a TODO for myself where we define the in memory datastructure to consider this.

pkg/server/status_test.go, line 1615 at r7 (raw file):

Previously, solongordon (Solon) wrote…

I wonder if you can find a way to make this test a bit more robust and debuggable so you're not just comparing a list of counts. A random idea I had was that when you're running the queries, you could set a different application name for each transaction fingerprint to make it easier to distinguish them when verifying the results. Maybe you can come up with something better.

Updated the test by tying queries to separate app names. This test seems much better now, lemme know what you think!

pkg/sql/app_stats.go, line 169 at r7 (raw file):

Previously, solongordon (Solon) wrote…

Hmm, this is interesting. It seems like if this setting is enabled, we may have statement IDs which don't have a corresponding entry in the statement statistics. That could cause trouble in the UI. Don't worry about addressing that in this PR but I think we'll have to revisit this and think about a solution.

Yeah, but it's important to get the statementID so that we don't bucket transactions which are equal bar a statement that doesn't meet the threshold. :/

pkg/sql/app_stats.go, line 662 at r7 (raw file):

Previously, solongordon (Solon) wrote…

Why did this change?

Result of a rebase -- I think Radu recently moved vectorized and disstSQLUsed on to the value of statement statistics map. The reasoning is that we don't expect this stuff to change much from one execution to another. This changed the semantics of key construction, and vectorized/distSQLUsed refers to the last execution of that key. I added this mu struct for fields that change from one exec to the next (all fields bar statementID).

pkg/sql/conn_executor_exec.go, line 1431 at r7 (raw file):

Previously, solongordon (Solon) wrote…

s/they/the/

Done.

solongordon

Reviewed 4 of 5 files at r8.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)

pkg/server/status_test.go, line 1615 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Updated the test by tying queries to separate app names. This test seems much better now, lemme know what you think!

Better, thanks!

pkg/sql/app_stats.go, line 169 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Yeah, but it's important to get the statementID so that we don't bucket transactions which are equal bar a statement that doesn't meet the threshold. :/

Agree.

pkg/sql/app_stats.go, line 662 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Result of a rebase -- I think Radu recently moved vectorized and disstSQLUsed on to the value of statement statistics map. The reasoning is that we don't expect this stuff to change much from one execution to another. This changed the semantics of key construction, and vectorized/distSQLUsed refers to the last execution of that key. I added this mu struct for fields that change from one exec to the next (all fields bar statementID).

Oh, I meant why did it move outside the if ok block?

solongordon

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)

pkg/server/status_test.go, line 1588 at r8 (raw file):

	for i := 1; i <= len(testCases); i++ {
		appName := fmt.Sprintf("app%d", i)
		tc := testCases[appName]

I don't know that this is worth changing but this method of iteration is kind of icky. I'd rather see the test cases as a slice without specifying the app name. Then this part could iterate over that slice, generate the app name, and add the test case to a map. That way you don't have this brittle situation where things break if the app name doesn't match between those two places.

pkg/server/status_test.go, line 1617 at r8 (raw file):

	}

	for _, respTransaction := range resp.Transactions {

Just realized that your test will still pass if one of the expected app names does not appear in resp.Transactions. In fact it will pass if resp.Transactions is empty. 😛

arulajmani

Made the testing improvements.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @solongordon)

pkg/server/status_test.go, line 1588 at r8 (raw file):

Previously, solongordon (Solon) wrote…

I don't know that this is worth changing but this method of iteration is kind of icky. I'd rather see the test cases as a slice without specifying the app name. Then this part could iterate over that slice, generate the app name, and add the test case to a map. That way you don't have this brittle situation where things break if the app name doesn't match between those two places.

Done.

pkg/server/status_test.go, line 1617 at r8 (raw file):

Previously, solongordon (Solon) wrote…

Just realized that your test will still pass if one of the expected app names does not appear in resp.Transactions. In fact it will pass if resp.Transactions is empty. 😛

Added.

pkg/sql/app_stats.go, line 662 at r7 (raw file):

Previously, solongordon (Solon) wrote…

Oh, I meant why did it move outside the if ok block?

Oops thats my bad on the rebase, brought it back in!

solongordon

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)

pkg/server/status_test.go, line 1588 at r8 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Done.

Yay, nice improvement

This patch adds the framework to start collecting transaction level statistics and return them on the `statements` end point. Previosuly, the response on the end point was being faked. This patch replaces the fake response by an actual response with some fields correctly populated. Other fields will be populated in upcoming patches, and currently contain garbage data. The correctly populated fields on `CollectedTransactionStatistics` are: 1. StatementIDs 2. Count 3. ServiceLat 4. App 5. RetryLat 6. MaxRetries 7. CommitLat Release note: None

arulajmani · 2020-08-21T00:29:18Z

So turns out I was making an assumption that all transactions shouldn't have to retry (except the ones which do the force retry), but it turned out it's very hard to control for this on testrace. Sort of surprised, cuz I had stressed locally, but oh well. I've removed that for now, it was a sanity check anyway.

TY for the reviews!

arulajmani · 2020-08-21T01:33:32Z

bors r=solongordon

craig · 2020-08-21T03:05:36Z

Build succeeded:

GitHub CI (Cockroach)

@knz

We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Re-organizes the code to only construct the statement IDs (deemed the expensive operation) if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None

@knz

We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Re-organizes the code to only construct the statement IDs (deemed the expensive operation) if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None

@knz

We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Re-organizes the code to only construct the statement IDs (deemed the expensive operation) if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None

@knz

We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Reorganizing the code to only construct a statement ID from scratch - if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None

@knz

We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Reorganizing the code to only construct a statement ID from scratch - if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None

@knz

55062: kvserver: remove unused parameter r=TheSamHuang a=tbg Release note: None 55214: sql: fix performance regression due to increased hash allocations r=nvanbenschoten,knz a=arulajmani We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in #52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Re-organizes the code to only construct the statement IDs (deemed the expensive operation) if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes #54515 Release note: None 55277: server: create engines in NewServer r=irfansharif a=tbg We were jumping through a number of hoops to create the engines only in `(*Server).Start` since that seems to be the "idiomatic" place to start moving parts. However, it creates a lot of complexity since various callbacks have to be registered with access to engines. Move engine creation to `NewServer`. This unblocks #54936. Release note: None 55280: roachtest: recognize GEOS dlls on more platforms r=otan a=knz This makes roachtest work on the BSDs again. Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: arulajmani <[email protected]> Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>

@knz

We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Reorganizing the code to only construct a statement ID from scratch - if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None

@knz

We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in #52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Reorganizing the code to only construct a statement ID from scratch - if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes #54515 Release note: None

solongordon suggested changes Aug 13, 2020

View reviewed changes

arulajmani force-pushed the transaction-level-stats branch from 0dc35d3 to 8434e73 Compare August 18, 2020 18:19

arulajmani commented Aug 18, 2020

View reviewed changes

arulajmani marked this pull request as ready for review August 18, 2020 18:23

arulajmani requested a review from solongordon August 18, 2020 18:23

arulajmani force-pushed the transaction-level-stats branch from 8434e73 to dc0ac2b Compare August 18, 2020 19:23

solongordon suggested changes Aug 19, 2020

View reviewed changes

solongordon reviewed Aug 19, 2020

View reviewed changes

arulajmani force-pushed the transaction-level-stats branch 2 times, most recently from 0d0f549 to 4c7afcb Compare August 19, 2020 21:50

arulajmani commented Aug 19, 2020

View reviewed changes

arulajmani requested a review from solongordon August 19, 2020 21:54

arulajmani force-pushed the transaction-level-stats branch from 4c7afcb to 6caf140 Compare August 19, 2020 21:55

arulajmani force-pushed the transaction-level-stats branch from 6caf140 to c40d784 Compare August 19, 2020 22:43

arulajmani force-pushed the transaction-level-stats branch from c40d784 to 6bc3fb2 Compare August 20, 2020 03:58

arulajmani force-pushed the transaction-level-stats branch from 6bc3fb2 to b9710fd Compare August 20, 2020 14:54

solongordon suggested changes Aug 20, 2020

View reviewed changes

arulajmani force-pushed the transaction-level-stats branch 2 times, most recently from 31e6291 to db3cb4f Compare August 20, 2020 19:07

arulajmani commented Aug 20, 2020

View reviewed changes

arulajmani requested a review from solongordon August 20, 2020 19:08

solongordon approved these changes Aug 20, 2020

View reviewed changes

arulajmani force-pushed the transaction-level-stats branch 2 times, most recently from 730f5f8 to ba1fc91 Compare August 20, 2020 20:44

arulajmani commented Aug 20, 2020

View reviewed changes

solongordon approved these changes Aug 20, 2020

View reviewed changes

arulajmani force-pushed the transaction-level-stats branch from ba1fc91 to 880f6b7 Compare August 20, 2020 22:26

arulajmani force-pushed the transaction-level-stats branch from 880f6b7 to 9039cb1 Compare August 20, 2020 23:52

craig bot merged commit 4ab67e2 into cockroachdb:master Aug 21, 2020

solongordon mentioned this pull request Aug 24, 2020

server: stub out interface for transaction stats #52251

Closed

arulajmani mentioned this pull request Oct 4, 2020

sql: fix performance regression due to increased hash allocations #55214

Merged

arulajmani mentioned this pull request Oct 7, 2020

release-20.2: sql: fix performance regression due to increased hash allocations #55295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add support for transaction level stats collection #52704

sql: add support for transaction level stats collection #52704

arulajmani commented Aug 12, 2020 •

edited

Loading

cockroach-teamcity commented Aug 12, 2020

solongordon left a comment

arulajmani left a comment

solongordon left a comment

solongordon left a comment

arulajmani left a comment

arulajmani commented Aug 19, 2020

arulajmani commented Aug 20, 2020

solongordon left a comment

arulajmani left a comment

solongordon left a comment

solongordon left a comment

arulajmani left a comment

solongordon left a comment

arulajmani commented Aug 21, 2020 •

edited

Loading

arulajmani commented Aug 21, 2020

craig bot commented Aug 21, 2020

sql: add support for transaction level stats collection #52704

sql: add support for transaction level stats collection #52704

Conversation

arulajmani commented Aug 12, 2020 • edited Loading

cockroach-teamcity commented Aug 12, 2020

solongordon left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

solongordon left a comment

Choose a reason for hiding this comment

solongordon left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

arulajmani commented Aug 19, 2020

arulajmani commented Aug 20, 2020

solongordon left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

solongordon left a comment

Choose a reason for hiding this comment

solongordon left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

solongordon left a comment

Choose a reason for hiding this comment

arulajmani commented Aug 21, 2020 • edited Loading

arulajmani commented Aug 21, 2020

craig bot commented Aug 21, 2020

arulajmani commented Aug 12, 2020 •

edited

Loading

arulajmani commented Aug 21, 2020 •

edited

Loading