Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: allow kvBatchSize to be customized #40850

Closed
wants to merge 1 commit into from

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Sep 17, 2019

This commit allows the kvBatchSize to be customized per txnKVFetcher
so that CREATE STATISTICS jobs can use a smaller batch size. Prior to
this commit, kvBatchSize was always 10000 rows for all queries. It
could be customized using a function SetKVBatchSize, but that function
is only useful for testing because it changes the value of kvBatchSize
at a global level and is not thread safe. This commit makes it possible
to change the batch size locally, and sets the default kvBatchSize for
CREATE STATISTICS to 1000.

This is necessary to minimize the risk of out-of-memory errors when
running CREATE STATISTICS on tables with very wide columns.

Fixes #33660

Release note: None
Release justification: The potential for OOM due to CREATE STATISTICS
makes this category 3: Fixes for high-priority or high-severity
bugs in existing functionality.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft
Copy link
Collaborator Author

rytaft commented Sep 18, 2019

Whoops this is buggy -- please hold off on reviewing

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok -- confirmed that this reduces memory utilization so no machine exceeds utilization of 7 GB during the hotspotsplits/nodes=4 test.

Ready for review

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, just some nits. We should run the tpcc roachtest a few times with and without the change to make sure there is no regression in the # of warehouses. Also run some CREATE STATISTICS on a large table to see if there is a lot of perf difference. Thanks!

[nit] we had (and still have) kvBatchSize but all the new parameters are called "batch limit". We should keep them consistent (I'd keep "size" for both).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rytaft)


pkg/sql/distsql_plan_stats.go, line 41 at r1 (raw file):

const histogramSamples = 10000
const histogramBuckets = 200
const kvBatchLimit = 1000

rename to statsKVBatchLimit (sql.kvBatchLimit suggests something else)


pkg/sql/execinfra/rowfetcher.go, line 83 at r1 (raw file):

		ValNeededForCol:  valNeededForCol,
	}
	if err := fetcher.Init(

Any reason we can't pass it to Init?
I'd also consider renaming kvBatchSize to defaultKVBatchSize and using that instead of 0 everywhere.


pkg/sql/row/kv_batch_fetcher.go, line 36 at r1 (raw file):

// SetKVBatchSize changes the kvBatchFetcher batch size, and returns a function that restores it.
func SetKVBatchSize(val int64) func() {

This could be SetDefaultKVBatchSize and have it just change the default, rather than always overriding.

@rytaft rytaft force-pushed the memory branch 2 times, most recently from 50e5095 to ab99ab7 Compare September 18, 2019 17:21
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! Updated to use batchSize instead of batchLimit.

I'll run some benchmarks now to check performance.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/distsql_plan_stats.go, line 41 at r1 (raw file):

Previously, RaduBerinde wrote…

rename to statsKVBatchLimit (sql.kvBatchLimit suggests something else)

Done.


pkg/sql/execinfra/rowfetcher.go, line 83 at r1 (raw file):

Any reason we can't pass it to Init?

Init is called by about 20 other functions, so it just explodes the number of places we'd need to plumb through the batchSize param. I can do it if you think that would be cleaner, but I was just trying to keep the footprint of this PR as small as possible...

I'd also consider renaming kvBatchSize to defaultKVBatchSize and using that instead of 0 everywhere.

Do you think I should export defaultKVBatchSize? The places I'm passing 0 are not all in the same package.


pkg/sql/row/kv_batch_fetcher.go, line 36 at r1 (raw file):

Previously, RaduBerinde wrote…

This could be SetDefaultKVBatchSize and have it just change the default, rather than always overriding.

The problem is that create_stats_job_test.go is using this function to set defaultKVBatchSize to 10, and I want to make sure that it overrides statsKVBatchSize. Do you think it would be better to instead have a SetStatsKVBatchSize function to override statsKVBatchSize?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rytaft)


pkg/sql/execinfra/rowfetcher.go, line 83 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Any reason we can't pass it to Init?

Init is called by about 20 other functions, so it just explodes the number of places we'd need to plumb through the batchSize param. I can do it if you think that would be cleaner, but I was just trying to keep the footprint of this PR as small as possible...

I'd also consider renaming kvBatchSize to defaultKVBatchSize and using that instead of 0 everywhere.

Do you think I should export defaultKVBatchSize? The places I'm passing 0 are not all in the same package.

Ah, separate function is ok then. I would export the constant.


pkg/sql/row/fetcher.go, line 47 at r2 (raw file):

		batchResponse []byte, origSpan roachpb.Span, err error)
	GetRangesInfo() []roachpb.RangeInfo
	SetBatchSize(batchSize int64)

A comment here would be good.


pkg/sql/row/kv_batch_fetcher.go, line 36 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

The problem is that create_stats_job_test.go is using this function to set defaultKVBatchSize to 10, and I want to make sure that it overrides statsKVBatchSize. Do you think it would be better to instead have a SetStatsKVBatchSize function to override statsKVBatchSize?

Oh, yeah, SetStatsKVBatchSize makes more sense to me


pkg/sql/row/kv_batch_fetcher.go, line 90 at r2 (raw file):

var _ kvBatchFetcher = &txnKVFetcher{}

func (f *txnKVFetcher) SetBatchSize(batchSize int64) {

[nit] SetBatchSize is part of the kvBatchFetcher interface

@rytaft rytaft force-pushed the memory branch 4 times, most recently from 50c7fba to 2109656 Compare September 18, 2019 18:54
@rytaft rytaft requested a review from a team as a code owner September 18, 2019 18:54
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/execinfra/rowfetcher.go, line 83 at r1 (raw file):

Previously, RaduBerinde wrote…

Ah, separate function is ok then. I would export the constant.

Done.


pkg/sql/row/fetcher.go, line 47 at r2 (raw file):

Previously, RaduBerinde wrote…

A comment here would be good.

Done.


pkg/sql/row/kv_batch_fetcher.go, line 36 at r1 (raw file):

Previously, RaduBerinde wrote…

Oh, yeah, SetStatsKVBatchSize makes more sense to me

Done.


pkg/sql/row/kv_batch_fetcher.go, line 90 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] SetBatchSize is part of the kvBatchFetcher interface

Done.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @rytaft)


pkg/sql/distsql_plan_stats.go, line 54 at r3 (raw file):

)

var statsKVBatchSize int64 = 1000

[nit] Add a comment explaining that we want a lower limit for stats to reduce memory usage. And maybe a TODO about improving the SQL-KV interface.

This commit allows the kvBatchSize to be customized per txnKVFetcher
so that CREATE STATISTICS jobs can use a smaller batch size. Prior to
this commit, kvBatchSize was always 10000 rows for all queries. It
could be customized using a function SetKVBatchSize, but that function
is only useful for testing because it changes the value of kvBatchSize
at a global level and is not thread safe. This commit makes it possible
to change the batch size locally, and sets the default kvBatchSize for
CREATE STATISTICS to 1000.

This is necessary to minimize the risk of out-of-memory errors when
running CREATE STATISTICS on tables with very wide columns.

Fixes cockroachdb#33660

Release note: None
Release justification: The potential for OOM due to CREATE STATISTICS
makes this category 3: Fixes for high-priority or high-severity
bugs in existing functionality.
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis)


pkg/sql/distsql_plan_stats.go, line 54 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] Add a comment explaining that we want a lower limit for stats to reduce memory usage. And maybe a TODO about improving the SQL-KV interface.

Done.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @rytaft)


pkg/sql/distsql_physical_planner.go, line 871 at r3 (raw file):

// corresponds to a scanNode, except for the Spans and OutputColumns.
func initTableReaderSpec(
	n *scanNode, planCtx *PlanningCtx, indexVarMap []int, batchSize int64,

From an API point of view, it might be better to just have the caller set the BatchSize field. That way, every caller doesn't have to pass an extra parameter, and probably end up with different values over time, even though they just want the default. I often like to follow a pattern of only having the "required" (or almost-always-used) parameters passed to constructors, and then any "optional" parameters get set after initial construction. See optbuilder.Builder for example, where we construct builder using New, and then sometimes set additional attributes afterwards.

@rytaft
Copy link
Collaborator Author

rytaft commented Sep 24, 2019

Closing this due to the performance issues identified here: #33660 (comment)

Unfortunately reducing the batch size to 1000 for stats seems to have a negative impact on performance. I ran the tpccbench/nodes=3/cpu=16 roachtest twice with the stats batch size set to 1000, and twice with the batch size set to 10000. With 1000, max warehouses was 1315 and 1365. With 10000, it was 1605 and 1685. I'm not sure exactly why the difference is so great, but it seems clear that I should not merge #40850.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: hotspotsplits/nodes=4 failed
4 participants