Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add support for transaction level stats collection #52704

Merged
merged 2 commits into from
Aug 21, 2020

Conversation

arulajmani
Copy link
Collaborator

@arulajmani arulajmani commented Aug 12, 2020

This patch adds the framework to start collecting transaction level
statistics and return them on the statements end point. Previously,
the response on the end point was being faked. This patch replaces the
fake response by an actual response with some fields correctly
populated. Other fields will be populated in upcoming patches, and
currently contain garbage data.

The correctly populated fields on CollectedTransactionStatistics are:

  1. StatementIDs
  2. Count
  3. ServiceLat
  4. App
  5. RetryLat
  6. MaxRetries
  7. CommitLat

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start! I want to be sensitive to the fact that transactions might have a very large number of statements in them in pathological cases. (Not even especially pathological. Netflix cited a real example where a user had a transaction with 100,000 insert statements in it. This is not advisable but also a realistic thing that users might do.)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/server/statements.go, line 90 at r2 (raw file):

	// TODO(arul): This assigning IDs should probably happen when we store
	//  the statements in the statsCollector or when we get them out of the
	//  stats collector.

Agree. We need to ensure that these IDs match the ones in the transaction stats, so ideally we would only generate this hash once and use it in both places.


pkg/sql/app_stats.go, line 48 at r2 (raw file):

// txnKey is the concatenation of all stmtKey.id's for all the statements in the
// transaction
type txnKey string

I'm a bit concerned about the potential for giant strings here. (Imagine the case where a transaction has 100k statements in it.) What if we made this a hash of all the statement fingerprints?


pkg/sql/conn_executor.go, line 1016 at r2 (raw file):

		// transactionStatements tracks all statements that make up the current
		// transaction.
		transactionStatements []*Statement

Is it possible we already have this data somewhere? It feels like it must be necessary to support transaction retries.

If we do need to add this, it seems like all you really need is the statement ID, so holding a pointer to the entire statement feels excessive.

@arulajmani arulajmani force-pushed the transaction-level-stats branch from 0dc35d3 to 8434e73 Compare August 18, 2020 18:19
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have one outstanding comment left to address and a couple of TODOs in the code, but I think this is in a place where I can un-WIP this to get some more feedback.

Also left an inline comment about the structure of the response I'd like clarification on.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)


pkg/server/statements.go, line 90 at r2 (raw file):

Previously, solongordon (Solon) wrote…

Agree. We need to ensure that these IDs match the ones in the transaction stats, so ideally we would only generate this hash once and use it in both places.

Is there a reason we have these ID field on roachpb.StatementStatisticsKey instead of roachpb.StatementStatistics? I think it would be better to have the ID on the value (roachpb.StatementStatistics) because ID is a derived field and not part of the key used to bucket statements.

Also I'm not completely sure why there is a need to organize this stuff as CollectedStatementStatistics which contains StatementStatisticsKey and StatementStatistics. Whatever the reason may be, I wonder if that will apply to transaction statistics as well. Do we need a CollectedTransactionStatistics which contains a TransactionStatisticsKey and TransactionStatistics? Currently we only have a flat TransactionStatistics structure in the response.


pkg/sql/app_stats.go, line 48 at r2 (raw file):

Previously, solongordon (Solon) wrote…

I'm a bit concerned about the potential for giant strings here. (Imagine the case where a transaction has 100k statements in it.) What if we made this a hash of all the statement fingerprints?

Done.

@arulajmani arulajmani marked this pull request as ready for review August 18, 2020 18:23
@arulajmani arulajmani requested a review from solongordon August 18, 2020 18:23
@arulajmani arulajmani force-pushed the transaction-level-stats branch from 8434e73 to dc0ac2b Compare August 18, 2020 19:23
Copy link
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 19 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)


pkg/roachpb/app_stats.proto, line 121 at r4 (raw file):

  // CommitLat is the amount of time required to commit the transaction after
  // all statement operations have been applied.
  optional NumericStat commit_lat = 8 [(gogoproto.nullable) = false];

Seems like this isn't populated yet, thought fine with me to save that for a follow-up PR.


pkg/server/statements.go, line 90 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Is there a reason we have these ID field on roachpb.StatementStatisticsKey instead of roachpb.StatementStatistics? I think it would be better to have the ID on the value (roachpb.StatementStatistics) because ID is a derived field and not part of the key used to bucket statements.

Also I'm not completely sure why there is a need to organize this stuff as CollectedStatementStatistics which contains StatementStatisticsKey and StatementStatistics. Whatever the reason may be, I wonder if that will apply to transaction statistics as well. Do we need a CollectedTransactionStatistics which contains a TransactionStatisticsKey and TransactionStatistics? Currently we only have a flat TransactionStatistics structure in the response.

I see what you mean about having ID in the Key but putting it in the value feels weird to me too. What about adding it to CollectedStatementStatistics?

I would guess the reason for CollectedStatementStatistics is just that protobuf doesn't have a map datatype, so someone created that type to represent the key-value pairs. We might as well mimic that for transactions to keep things analogous I suppose.


pkg/sql/app_stats.go, line 48 at r4 (raw file):

// txnKey is the concatenation of all stmtKey.id's for all the statements in the
// transaction

Out of date comment


pkg/sql/app_stats.go, line 77 at r4 (raw file):

	mu struct {
		syncutil.Mutex
		// TODO(arul): Can we rename this without breaking stuff?

What specifically do you want to rename? I think you can rename most anything except protobuf rpc names.


pkg/sql/app_stats.go, line 504 at r4 (raw file):

}

func constructStatementID(queryFingerprint string) string {

As you pointed out, it seems like we need to take the entire statement key into account here to avoid conflicts.


pkg/sql/backfill.go, line 150 at r4 (raw file):

// This operates over multiple goroutines concurrently and is thus not
// able to reuse the original kv.Txn safely. The various
// function that it calls make their own txnCounts.

I think you had a rename go rogue here. Lots of unintended txn -> txnCounts replacements.


pkg/sql/conn_executor.go, line 1016 at r2 (raw file):

Previously, solongordon (Solon) wrote…

Is it possible we already have this data somewhere? It feels like it must be necessary to support transaction retries.

If we do need to add this, it seems like all you really need is the statement ID, so holding a pointer to the entire statement feels excessive.

This comment still stands.

Copy link
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, you should run make bin/protoc-gen-gogoroach to avoid the churn in mvcc3.pb.go and timestamp.pb.go due to the Go 1.13 downgrade.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

@arulajmani arulajmani force-pushed the transaction-level-stats branch 2 times, most recently from 0d0f549 to 4c7afcb Compare August 19, 2020 21:50
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some more changes on the general structure and addressed all outstanding comments. Should be RFAL.

There's a few fields left unpopulated for now, I'll either update this branch or do them as follow-ups to this PR, depending on how soon we are able to merge this big change!

Reviewed 1 of 23 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @solongordon)


pkg/roachpb/app_stats.proto, line 121 at r4 (raw file):

Previously, solongordon (Solon) wrote…

Seems like this isn't populated yet, thought fine with me to save that for a follow-up PR.

Realized I had the wrong list of fields populated in my commit desc, fixed that now! I'll try and push this (and anything else left) on this branch, but if this gets approved before Ill save it for a follow-up.


pkg/server/statements.go, line 90 at r2 (raw file):

Previously, solongordon (Solon) wrote…

I see what you mean about having ID in the Key but putting it in the value feels weird to me too. What about adding it to CollectedStatementStatistics?

I would guess the reason for CollectedStatementStatistics is just that protobuf doesn't have a map datatype, so someone created that type to represent the key-value pairs. We might as well mimic that for transactions to keep things analogous I suppose.

I'm adding this to CollectedStatementStatistics for now. How do you feel about de-coupling this proto structure from this internal map (appStats.stmts) that we have? I don't see a good reason to have it this stuff be mirrored.

We should still have CollectedStatementStatistics which contains StatementStatistics inside it, but we should get rid of StatementStatisticsKey. All the fields on the key right now should be added to CollectedStatementStatistics as flat fields, as they are immutable (including the StatementID).

The reason I'm proposing this de-coupling is because I want to change the appStats.stmt map's key. Ideally, we should use the statement ID as the key, as it is a hash of all the fields currently in the stmtKey. Right now, I'm storing the Statement ID as a value (unprotected by a mutex, as it is only set on creation). Ideally, I'd flip this around -- the fields on stmtKey would be stored as immutable fields and the Statement ID would serve as the key. By making this change, we guard ourselves against future code that wants to add/remove axis' for statement bucketing. Adding a new axis will require making the change to how Statement ID is constructed, which will preserve the 1:1 mapping between an ID recorded by a txn and IDs corresponding to statements. Probably won't do this as part of this PR, as it's slightly involved, but I'm in favour of decoupling the map and protobuf here though. Lemme know what you think!


pkg/sql/app_stats.go, line 48 at r4 (raw file):

Previously, solongordon (Solon) wrote…

Out of date comment

Done.


pkg/sql/app_stats.go, line 77 at r4 (raw file):

Previously, solongordon (Solon) wrote…

What specifically do you want to rename? I think you can rename most anything except protobuf rpc names.

I was thinking of renaming this to TxnCounts to make make it in line with the enclosing struct.

I'm open to suggestions on how we can rename this (and the enclosing struct) now that we have these new transaction statistics. transactionCounts is sort of correct but not really, This struct captures very coarse grained transaction stats, like the number transactions started, number of transactions committed, number of transactions that were implicit, and the average time for all transactions.


pkg/sql/app_stats.go, line 504 at r4 (raw file):

Previously, solongordon (Solon) wrote…

As you pointed out, it seems like we need to take the entire statement key into account here to avoid conflicts.

Done.


pkg/sql/backfill.go, line 150 at r4 (raw file):

Previously, solongordon (Solon) wrote…

I think you had a rename go rogue here. Lots of unintended txn -> txnCounts replacements.

:( :( reverted, hopefully I didn't miss any.


pkg/sql/conn_executor.go, line 1016 at r2 (raw file):

Previously, solongordon (Solon) wrote…

This comment still stands.

Looks like we have a this txnRewindPos but it's only defined when the txn is open. This txnRewindPos refers to the position in the buffer. Maybe we could use it, but we'd have to extend it to ensure the buffer isn't cleared when we reach the transaction recording stuff. Seems like quite a lot of effort for too little reward, so I'm taking your second recommendation of recording just the statement IDs. This way, we only calculate this statement ID stuff once, which I quite like.

@arulajmani arulajmani requested a review from solongordon August 19, 2020 21:54
@arulajmani arulajmani force-pushed the transaction-level-stats branch from 4c7afcb to 6caf140 Compare August 19, 2020 21:55
To support adding transaction-level statistics to the webui, I modified
the existing StatementsResponse to include a new `transactions` field.
Transactions here are defined by the collection of statement
fingerprints within. To allow referencing statements, I added a
statement ID field, which is a hash of the statement fingerprint.

For the moment, transactions in the response are faked out by combining
together statement IDs at random. This is meant to unblock development
on the frontend while we work on populating the response with the real
transaction data.

Release note: None
@arulajmani arulajmani force-pushed the transaction-level-stats branch from 6caf140 to c40d784 Compare August 19, 2020 22:43
@arulajmani
Copy link
Collaborator Author

Rebased master to get rid of some merge conflicts.

@arulajmani arulajmani force-pushed the transaction-level-stats branch from c40d784 to 6bc3fb2 Compare August 20, 2020 03:58
@arulajmani
Copy link
Collaborator Author

Added tracking for commit latency as well. The test I wrote is failing on stress though, I'll debug that tomorrow

@arulajmani arulajmani force-pushed the transaction-level-stats branch from 6bc3fb2 to b9710fd Compare August 20, 2020 14:54
Copy link
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 14 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)


pkg/roachpb/app_stats.proto, line 158 at r7 (raw file):

message CollectedStatementStatistics {
  // ID is a hash of the statement key (query fingerprint, vectorized or not,
  // implicit txn or not, etc) which can be used to identify the statement

Nit: Turns out vectorized is not considered here, right?


pkg/server/statements.go, line 90 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I'm adding this to CollectedStatementStatistics for now. How do you feel about de-coupling this proto structure from this internal map (appStats.stmts) that we have? I don't see a good reason to have it this stuff be mirrored.

We should still have CollectedStatementStatistics which contains StatementStatistics inside it, but we should get rid of StatementStatisticsKey. All the fields on the key right now should be added to CollectedStatementStatistics as flat fields, as they are immutable (including the StatementID).

The reason I'm proposing this de-coupling is because I want to change the appStats.stmt map's key. Ideally, we should use the statement ID as the key, as it is a hash of all the fields currently in the stmtKey. Right now, I'm storing the Statement ID as a value (unprotected by a mutex, as it is only set on creation). Ideally, I'd flip this around -- the fields on stmtKey would be stored as immutable fields and the Statement ID would serve as the key. By making this change, we guard ourselves against future code that wants to add/remove axis' for statement bucketing. Adding a new axis will require making the change to how Statement ID is constructed, which will preserve the 1:1 mapping between an ID recorded by a txn and IDs corresponding to statements. Probably won't do this as part of this PR, as it's slightly involved, but I'm in favour of decoupling the map and protobuf here though. Lemme know what you think!

I'm fine with them being de-coupled though I agree we shouldn't change this as part of this PR. Let's revisit whether it makes sense to use the hash as a key here or not.


pkg/server/status_test.go, line 1615 at r7 (raw file):

	sort.Ints(expectedCounts)

	if !reflect.DeepEqual(respCounts, expectedCounts) {

I wonder if you can find a way to make this test a bit more robust and debuggable so you're not just comparing a list of counts. A random idea I had was that when you're running the queries, you could set a different application name for each transaction fingerprint to make it easier to distinguish them when verifying the results. Maybe you can come up with something better.


pkg/sql/app_stats.go, line 77 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I was thinking of renaming this to TxnCounts to make make it in line with the enclosing struct.

I'm open to suggestions on how we can rename this (and the enclosing struct) now that we have these new transaction statistics. transactionCounts is sort of correct but not really, This struct captures very coarse grained transaction stats, like the number transactions started, number of transactions committed, number of transactions that were implicit, and the average time for all transactions.

That makes sense. Let's come back to it in a subsequent PR.


pkg/sql/app_stats.go, line 169 at r7 (raw file):

	createIfNonExistent := true
	// If the statement is below the latency threshold, we don't need to create
	// an entry in the stmts map for it.

Hmm, this is interesting. It seems like if this setting is enabled, we may have statement IDs which don't have a corresponding entry in the statement statistics. That could cause trouble in the UI. Don't worry about addressing that in this PR but I think we'll have to revisit this and think about a solution.


pkg/sql/app_stats.go, line 662 at r7 (raw file):

			distSQLUsed := stats.mu.distSQLUsed
			vectorized := stats.mu.vectorized
			stats.mu.Unlock()

Why did this change?


pkg/sql/conn_executor.go, line 1016 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Looks like we have a this txnRewindPos but it's only defined when the txn is open. This txnRewindPos refers to the position in the buffer. Maybe we could use it, but we'd have to extend it to ensure the buffer isn't cleared when we reach the transaction recording stuff. Seems like quite a lot of effort for too little reward, so I'm taking your second recommendation of recording just the statement IDs. This way, we only calculate this statement ID stuff once, which I quite like.

Works for me!


pkg/sql/conn_executor_exec.go, line 1431 at r7 (raw file):

// recordTransactionStart records the start of the transaction and returns
// closures to be called once the transaction finishes or if they transaction

s/they/the/

@arulajmani arulajmani force-pushed the transaction-level-stats branch 2 times, most recently from 31e6291 to db3cb4f Compare August 20, 2020 19:07
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test, addressed all comments, and fixed a data race caught by CI. Should be RFAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @solongordon)


pkg/roachpb/app_stats.proto, line 158 at r7 (raw file):

Previously, solongordon (Solon) wrote…

Nit: Turns out vectorized is not considered here, right?

Yeah, this changed when I rebased! Fixed


pkg/server/statements.go, line 90 at r2 (raw file):

Previously, solongordon (Solon) wrote…

I'm fine with them being de-coupled though I agree we shouldn't change this as part of this PR. Let's revisit whether it makes sense to use the hash as a key here or not.

Makes sense. I've left a TODO for myself where we define the in memory datastructure to consider this.


pkg/server/status_test.go, line 1615 at r7 (raw file):

Previously, solongordon (Solon) wrote…

I wonder if you can find a way to make this test a bit more robust and debuggable so you're not just comparing a list of counts. A random idea I had was that when you're running the queries, you could set a different application name for each transaction fingerprint to make it easier to distinguish them when verifying the results. Maybe you can come up with something better.

Updated the test by tying queries to separate app names. This test seems much better now, lemme know what you think!


pkg/sql/app_stats.go, line 169 at r7 (raw file):

Previously, solongordon (Solon) wrote…

Hmm, this is interesting. It seems like if this setting is enabled, we may have statement IDs which don't have a corresponding entry in the statement statistics. That could cause trouble in the UI. Don't worry about addressing that in this PR but I think we'll have to revisit this and think about a solution.

Yeah, but it's important to get the statementID so that we don't bucket transactions which are equal bar a statement that doesn't meet the threshold. :/


pkg/sql/app_stats.go, line 662 at r7 (raw file):

Previously, solongordon (Solon) wrote…

Why did this change?

Result of a rebase -- I think Radu recently moved vectorized and disstSQLUsed on to the value of statement statistics map. The reasoning is that we don't expect this stuff to change much from one execution to another. This changed the semantics of key construction, and vectorized/distSQLUsed refers to the last execution of that key. I added this mu struct for fields that change from one exec to the next (all fields bar statementID).


pkg/sql/conn_executor_exec.go, line 1431 at r7 (raw file):

Previously, solongordon (Solon) wrote…

s/they/the/

Done.

@arulajmani arulajmani requested a review from solongordon August 20, 2020 19:08
Copy link
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 5 files at r8.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)


pkg/server/status_test.go, line 1615 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Updated the test by tying queries to separate app names. This test seems much better now, lemme know what you think!

Better, thanks!


pkg/sql/app_stats.go, line 169 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Yeah, but it's important to get the statementID so that we don't bucket transactions which are equal bar a statement that doesn't meet the threshold. :/

Agree.


pkg/sql/app_stats.go, line 662 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Result of a rebase -- I think Radu recently moved vectorized and disstSQLUsed on to the value of statement statistics map. The reasoning is that we don't expect this stuff to change much from one execution to another. This changed the semantics of key construction, and vectorized/distSQLUsed refers to the last execution of that key. I added this mu struct for fields that change from one exec to the next (all fields bar statementID).

Oh, I meant why did it move outside the if ok block?

Copy link
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)


pkg/server/status_test.go, line 1588 at r8 (raw file):

	for i := 1; i <= len(testCases); i++ {
		appName := fmt.Sprintf("app%d", i)
		tc := testCases[appName]

I don't know that this is worth changing but this method of iteration is kind of icky. I'd rather see the test cases as a slice without specifying the app name. Then this part could iterate over that slice, generate the app name, and add the test case to a map. That way you don't have this brittle situation where things break if the app name doesn't match between those two places.


pkg/server/status_test.go, line 1617 at r8 (raw file):

	}

	for _, respTransaction := range resp.Transactions {

Just realized that your test will still pass if one of the expected app names does not appear in resp.Transactions. In fact it will pass if resp.Transactions is empty. 😛

@arulajmani arulajmani force-pushed the transaction-level-stats branch 2 times, most recently from 730f5f8 to ba1fc91 Compare August 20, 2020 20:44
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the testing improvements.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @solongordon)


pkg/server/status_test.go, line 1588 at r8 (raw file):

Previously, solongordon (Solon) wrote…

I don't know that this is worth changing but this method of iteration is kind of icky. I'd rather see the test cases as a slice without specifying the app name. Then this part could iterate over that slice, generate the app name, and add the test case to a map. That way you don't have this brittle situation where things break if the app name doesn't match between those two places.

Done.


pkg/server/status_test.go, line 1617 at r8 (raw file):

Previously, solongordon (Solon) wrote…

Just realized that your test will still pass if one of the expected app names does not appear in resp.Transactions. In fact it will pass if resp.Transactions is empty. 😛

Added.


pkg/sql/app_stats.go, line 662 at r7 (raw file):

Previously, solongordon (Solon) wrote…

Oh, I meant why did it move outside the if ok block?

Oops thats my bad on the rebase, brought it back in!

Copy link
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @solongordon)


pkg/server/status_test.go, line 1588 at r8 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Done.

Yay, nice improvement

@arulajmani arulajmani force-pushed the transaction-level-stats branch from ba1fc91 to 880f6b7 Compare August 20, 2020 22:26
This patch adds the framework to start collecting transaction level
statistics and return them on the `statements` end point. Previosuly,
the response on the end point was being faked. This patch replaces the
fake response by an actual response with some fields correctly
populated. Other fields will be populated in upcoming patches, and
currently contain garbage data.

The correctly populated fields on `CollectedTransactionStatistics` are:
1. StatementIDs
2. Count
3. ServiceLat
4. App
5. RetryLat
6. MaxRetries
7. CommitLat

Release note: None
@arulajmani arulajmani force-pushed the transaction-level-stats branch from 880f6b7 to 9039cb1 Compare August 20, 2020 23:52
@arulajmani
Copy link
Collaborator Author

arulajmani commented Aug 21, 2020

So turns out I was making an assumption that all transactions shouldn't have to retry (except the ones which do the force retry), but it turned out it's very hard to control for this on testrace. Sort of surprised, cuz I had stressed locally, but oh well. I've removed that for now, it was a sanity check anyway.

TY for the reviews!

@arulajmani
Copy link
Collaborator Author

bors r=solongordon

@craig
Copy link
Contributor

craig bot commented Aug 21, 2020

Build succeeded:

@craig craig bot merged commit 4ab67e2 into cockroachdb:master Aug 21, 2020
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 4, 2020
We recently increased allocations in the `appStats` structure when
adding support for transaction level statistics in cockroachdb#52704. This was
because the interactions with the `fnv` library were expensive in terms
of allocations. This patch aims to claw back the regression by:

- Using our own implementation of the FNV algorithm instead of the
library, which is significantly lighter weight (microbenchmarks below).
- Re-organizes the code to only construct the statement IDs (deemed the
expensive operation) if required.

When comparing the difference between the commit that introduced the
regression and the changes proposed by this diff, I got the following
improvements on the KV workload:

```
name             old ops/s   new ops/s   delta
kv95-throughput  34.5k ± 6%  35.7k ± 4%  +3.42%  (p=0.023 n=10+10)

```

Microbenchmarks for the new hashing algorithm (written/run by @knz):
```
name                     old time/op    new time/op    delta
ConstructStatementID-32     405ns ±17%      39ns ±12%  -90.34%  (p=0.008 n=5+5)

name                     old alloc/op   new alloc/op   delta
ConstructStatementID-32      120B ± 0%       16B ± 0%  -86.67%  (p=0.008 n=5+5)

name                     old allocs/op  new allocs/op  delta
ConstructStatementID-32      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.008 n=5+5)
```

Closes cockroachdb#54515

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 5, 2020
We recently increased allocations in the `appStats` structure when
adding support for transaction level statistics in cockroachdb#52704. This was
because the interactions with the `fnv` library were expensive in terms
of allocations. This patch aims to claw back the regression by:

- Using our own implementation of the FNV algorithm instead of the
library, which is significantly lighter weight (microbenchmarks below).
- Re-organizes the code to only construct the statement IDs (deemed the
expensive operation) if required.

When comparing the difference between the commit that introduced the
regression and the changes proposed by this diff, I got the following
improvements on the KV workload:

```
name             old ops/s   new ops/s   delta
kv95-throughput  34.5k ± 6%  35.7k ± 4%  +3.42%  (p=0.023 n=10+10)

```

Microbenchmarks for the new hashing algorithm (written/run by @knz):
```
name                     old time/op    new time/op    delta
ConstructStatementID-32     405ns ±17%      39ns ±12%  -90.34%  (p=0.008 n=5+5)

name                     old alloc/op   new alloc/op   delta
ConstructStatementID-32      120B ± 0%       16B ± 0%  -86.67%  (p=0.008 n=5+5)

name                     old allocs/op  new allocs/op  delta
ConstructStatementID-32      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.008 n=5+5)
```

Closes cockroachdb#54515

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 6, 2020
We recently increased allocations in the `appStats` structure when
adding support for transaction level statistics in cockroachdb#52704. This was
because the interactions with the `fnv` library were expensive in terms
of allocations. This patch aims to claw back the regression by:

- Using our own implementation of the FNV algorithm instead of the
library, which is significantly lighter weight (microbenchmarks below).
- Re-organizes the code to only construct the statement IDs (deemed the
expensive operation) if required.

When comparing the difference between the commit that introduced the
regression and the changes proposed by this diff, I got the following
improvements on the KV workload:

```
name             old ops/s   new ops/s   delta
kv95-throughput  34.5k ± 6%  35.7k ± 4%  +3.42%  (p=0.023 n=10+10)

```

Microbenchmarks for the new hashing algorithm (written/run by @knz):
```
name                     old time/op    new time/op    delta
ConstructStatementID-32     405ns ±17%      39ns ±12%  -90.34%  (p=0.008 n=5+5)

name                     old alloc/op   new alloc/op   delta
ConstructStatementID-32      120B ± 0%       16B ± 0%  -86.67%  (p=0.008 n=5+5)

name                     old allocs/op  new allocs/op  delta
ConstructStatementID-32      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.008 n=5+5)
```

Closes cockroachdb#54515

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 6, 2020
We recently increased allocations in the `appStats` structure when
adding support for transaction level statistics in cockroachdb#52704. This was
because the interactions with the `fnv` library were expensive in terms
of allocations. This patch aims to claw back the regression by:

- Using our own implementation of the FNV algorithm instead of the
library, which is significantly lighter weight (microbenchmarks below).
- Reorganizing the code to only construct a statement ID from scratch
- if required.

When comparing the difference between the commit that introduced the
regression and the changes proposed by this diff, I got the following
improvements on the KV workload:

```
name             old ops/s   new ops/s   delta
kv95-throughput  34.5k ± 6%  35.7k ± 4%  +3.42%  (p=0.023 n=10+10)

```

Microbenchmarks for the new hashing algorithm (written/run by @knz):
```
name                     old time/op    new time/op    delta
ConstructStatementID-32     405ns ±17%      39ns ±12%  -90.34%  (p=0.008 n=5+5)

name                     old alloc/op   new alloc/op   delta
ConstructStatementID-32      120B ± 0%       16B ± 0%  -86.67%  (p=0.008 n=5+5)

name                     old allocs/op  new allocs/op  delta
ConstructStatementID-32      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.008 n=5+5)
```

Closes cockroachdb#54515

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 7, 2020
We recently increased allocations in the `appStats` structure when
adding support for transaction level statistics in cockroachdb#52704. This was
because the interactions with the `fnv` library were expensive in terms
of allocations. This patch aims to claw back the regression by:

- Using our own implementation of the FNV algorithm instead of the
library, which is significantly lighter weight (microbenchmarks below).
- Reorganizing the code to only construct a statement ID from scratch
- if required.

When comparing the difference between the commit that introduced the
regression and the changes proposed by this diff, I got the following
improvements on the KV workload:

```
name             old ops/s   new ops/s   delta
kv95-throughput  34.5k ± 6%  35.7k ± 4%  +3.42%  (p=0.023 n=10+10)

```

Microbenchmarks for the new hashing algorithm (written/run by @knz):
```
name                     old time/op    new time/op    delta
ConstructStatementID-32     405ns ±17%      39ns ±12%  -90.34%  (p=0.008 n=5+5)

name                     old alloc/op   new alloc/op   delta
ConstructStatementID-32      120B ± 0%       16B ± 0%  -86.67%  (p=0.008 n=5+5)

name                     old allocs/op  new allocs/op  delta
ConstructStatementID-32      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.008 n=5+5)
```

Closes cockroachdb#54515

Release note: None
craig bot pushed a commit that referenced this pull request Oct 7, 2020
55062: kvserver: remove unused parameter r=TheSamHuang a=tbg

Release note: None

55214: sql: fix performance regression due to increased hash allocations r=nvanbenschoten,knz a=arulajmani

We recently increased allocations in the `appStats` structure when
adding support for transaction level statistics in #52704. This was
because the interactions with the `fnv` library were expensive in terms
of allocations. This patch aims to claw back the regression by:

- Using our own implementation of the FNV algorithm instead of the
library, which is significantly lighter weight (microbenchmarks below).
- Re-organizes the code to only construct the statement IDs (deemed the
expensive operation) if required.

When comparing the difference between the commit that introduced the
regression and the changes proposed by this diff, I got the following
improvements on the KV workload:

```
name             old ops/s   new ops/s   delta
kv95-throughput  34.5k ± 6%  35.7k ± 4%  +3.42%  (p=0.023 n=10+10)

```

Microbenchmarks for the new hashing algorithm (written/run by @knz):
```
name                     old time/op    new time/op    delta
ConstructStatementID-32     405ns ±17%      39ns ±12%  -90.34%  (p=0.008 n=5+5)

name                     old alloc/op   new alloc/op   delta
ConstructStatementID-32      120B ± 0%       16B ± 0%  -86.67%  (p=0.008 n=5+5)

name                     old allocs/op  new allocs/op  delta
ConstructStatementID-32      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.008 n=5+5)
```

Closes #54515

Release note: None

55277: server: create engines in NewServer r=irfansharif a=tbg

We were jumping through a number of hoops to create the engines only in
`(*Server).Start` since that seems to be the "idiomatic" place to start
moving parts. However, it creates a lot of complexity since various
callbacks have to be registered with access to engines. Move engine
creation to `NewServer`.

This unblocks #54936.

Release note: None


55280: roachtest: recognize GEOS dlls on more platforms r=otan a=knz

This makes roachtest work on the BSDs again.

Release note: None

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: arulajmani <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 7, 2020
We recently increased allocations in the `appStats` structure when
adding support for transaction level statistics in cockroachdb#52704. This was
because the interactions with the `fnv` library were expensive in terms
of allocations. This patch aims to claw back the regression by:

- Using our own implementation of the FNV algorithm instead of the
library, which is significantly lighter weight (microbenchmarks below).
- Reorganizing the code to only construct a statement ID from scratch
- if required.

When comparing the difference between the commit that introduced the
regression and the changes proposed by this diff, I got the following
improvements on the KV workload:

```
name             old ops/s   new ops/s   delta
kv95-throughput  34.5k ± 6%  35.7k ± 4%  +3.42%  (p=0.023 n=10+10)

```

Microbenchmarks for the new hashing algorithm (written/run by @knz):
```
name                     old time/op    new time/op    delta
ConstructStatementID-32     405ns ±17%      39ns ±12%  -90.34%  (p=0.008 n=5+5)

name                     old alloc/op   new alloc/op   delta
ConstructStatementID-32      120B ± 0%       16B ± 0%  -86.67%  (p=0.008 n=5+5)

name                     old allocs/op  new allocs/op  delta
ConstructStatementID-32      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.008 n=5+5)
```

Closes cockroachdb#54515

Release note: None
jayshrivastava pushed a commit that referenced this pull request Oct 8, 2020
We recently increased allocations in the `appStats` structure when
adding support for transaction level statistics in #52704. This was
because the interactions with the `fnv` library were expensive in terms
of allocations. This patch aims to claw back the regression by:

- Using our own implementation of the FNV algorithm instead of the
library, which is significantly lighter weight (microbenchmarks below).
- Reorganizing the code to only construct a statement ID from scratch
- if required.

When comparing the difference between the commit that introduced the
regression and the changes proposed by this diff, I got the following
improvements on the KV workload:

```
name             old ops/s   new ops/s   delta
kv95-throughput  34.5k ± 6%  35.7k ± 4%  +3.42%  (p=0.023 n=10+10)

```

Microbenchmarks for the new hashing algorithm (written/run by @knz):
```
name                     old time/op    new time/op    delta
ConstructStatementID-32     405ns ±17%      39ns ±12%  -90.34%  (p=0.008 n=5+5)

name                     old alloc/op   new alloc/op   delta
ConstructStatementID-32      120B ± 0%       16B ± 0%  -86.67%  (p=0.008 n=5+5)

name                     old allocs/op  new allocs/op  delta
ConstructStatementID-32      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.008 n=5+5)
```

Closes #54515

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants