-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance tuning tutorial #3378
Conversation
8f07c9c
to
9ae41bf
Compare
9ae41bf
to
48ae323
Compare
83f7d50
to
10d9ee3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, it is very readable.
v2.0/performance-tuning.md
Outdated
CockroachDB requires TCP communication on two ports: | ||
|
||
- **26257** (`tcp:26257`) for inter-node communication (i.e., working as a cluster) | ||
- **8080** (`tcp:8080`) for accessing the Admin UI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this needs to become 'web ui', here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
v2.0/performance-tuning.md
Outdated
|
||
2. Note the internal IP address of each `n1-standard-4` instance. You'll need these addresses when starting the CockroachDB nodes. | ||
|
||
3. SSH to each instance and [optimize the local SSD for write performance](https://cloud.google.com/compute/docs/disks/performance#optimize_local_ssd) (see the **Disable write cache flushing** section). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reader will see this and pause and think "wait, what?" (like I just did)
disabling write cache flushing means the data becomes less "safe" -- it disables the guarantee that cockroachdb expects from the underlying drive that flushed data is guaranteed to be written to persistent storage.
If you're 100% sure about yourself here, you still need a callout underneath to reassure the reader that this is OK (and explain them why).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I assumed this was safe since it's a technique that we're featuring in our tpc-c benchmarking. @nvanbenschoten, @bdarnell, can either of you confirm that this is in fact safe and, if so, help me come up with language that clarifies that fact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just to be clear, @knz, the reason I'm featuring this is, based on my testing, it significantly improves performance. @robert-s-lee is also impressed with the speedup it gives you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc. @petermattis and @arjunravinarayan. I think they'll be able to give the most accurate answers to the following questions: "is it safe?" and "should we recommend it?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is not safe to do on cloud instances. It is what one would do to replicate the performance characteristics of an on-prem capacitor-backed SSDs would give you, and this is why we have these instructions in our performance whitepaper. But I do not believe it is suitable guidance for a performance tuning guide in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks, @arjunravinarayan. If @petermattis agrees that this is not safe to do on cloud instances, I'll remove this and rerun all the numbers.
v2.0/performance-tuning.md
Outdated
REFERENCES vehicles (city, id); | ||
~~~ | ||
|
||
**Add a note about why we need vehicle_city and link to ticket. Basically, we can't have 2 foreign keys constraints on the same column (city). So we duplicate city as vehicle_city and add check constraint to ensure that they are identical. https://github.com/cockroachdb/cockroach/issues/23580** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the issue URL a link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note in the ### Schema section
. PTAL.
v2.0/performance-tuning.md
Outdated
|
||
With this information, we can visualize what's happening, assuming the request is sent to node 1 and ignoring non-involved ranges: | ||
|
||
<img src="{{ 'images/v2.0/perf_tuning_join1.png' | relative_url }}" alt="Perf tuning concepts" style="max-width:100%" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous diagrams were OK to follow along; this one in contrast has me thoroughly confused: I do not understand the meaning of the arrows that jump from one "lease" box to another on node 2.
If you mean to explain that node 1 is forwarding the full table scan to every range in the "rides" table, that must be done by using:
- 6 arrows from node 1 to node 2 (you may group them together because they actually flow along a single network link)
- 1 arrow from node 1 to node 3
- 6 arrows back from node 2 to the outside of node 1 (for the results)
- 1 arrow back from node 3 to the outside of node 1
Then for the join there is no extra arrow needed because node 1 is already leaseholder.
There should not be any arrow from node 2 to itself, and neither from node 2 to the "lease" box of node 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @knz. Do you think this section would be sufficient without the diagram? I think they were important in the intro concepts, but I'm unsure whether they're as useful embedded in the tutorial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the diagram is not necessary here indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it.
v2.0/performance-tuning.md
Outdated
(21 rows) | ||
~~~ | ||
|
||
This is a complex query plan, but the important thing to note is the full table scan of `rides@primary` above the `subquery`. This shows you that, after the subquery returns the IDs of the top 5 vehicles, CockroachDB scans the entire primary index to find the rows with `max(end_time)` for each `vehicle_id`, although you might expect CockroachDB to more efficiently use the secondary index on `vehicle_id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that we're working to lift this limitation in a future version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Discussed offline:
|
v2.0/performance-tuning.md
Outdated
CockroachDB requires TCP communication on two ports: | ||
|
||
- **26257** (`tcp:26257`) for inter-node communication (i.e., working as a cluster) | ||
- **8080** (`tcp:8080`) for accessing the Admin UI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
v2.0/performance-tuning.md
Outdated
REFERENCES vehicles (city, id); | ||
~~~ | ||
|
||
**Add a note about why we need vehicle_city and link to ticket. Basically, we can't have 2 foreign keys constraints on the same column (city). So we duplicate city as vehicle_city and add check constraint to ensure that they are identical. https://github.com/cockroachdb/cockroach/issues/23580** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note in the ### Schema section
. PTAL.
v2.0/performance-tuning.md
Outdated
|
||
With this information, we can visualize what's happening, assuming the request is sent to node 1 and ignoring non-involved ranges: | ||
|
||
<img src="{{ 'images/v2.0/perf_tuning_join1.png' | relative_url }}" alt="Perf tuning concepts" style="max-width:100%" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it.
v2.0/performance-tuning.md
Outdated
|
||
With this information, we can visualize what's happening now, still assuming the request is sent to node 1 and ignoring non-involved ranges: | ||
|
||
<img src="{{ 'images/v2.0/perf_tuning_join2.png' | relative_url }}" alt="Perf tuning concepts" style="max-width:100%" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also removed this image.
v2.0/performance-tuning.md
Outdated
(21 rows) | ||
~~~ | ||
|
||
This is a complex query plan, but the important thing to note is the full table scan of `rides@primary` above the `subquery`. This shows you that, after the subquery returns the IDs of the top 5 vehicles, CockroachDB scans the entire primary index to find the rows with `max(end_time)` for each `vehicle_id`, although you might expect CockroachDB to more efficiently use the secondary index on `vehicle_id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
v2.0/performance-tuning.md, line 141 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
OK, thanks, @arjunravinarayan. If @petermattis agrees that this is not safe to do on cloud instances, I'll remove this and rerun all the numbers.
Ditto what @arjunravinarayan said. Note that Google Persistent SSD does not flush to the storage media (effectively acting like nobarrier is set), while Local SSD does. If running a high performance database on-prem you'd get hardware that had capacitor-backed or battery-backed SSDs. Future versions of cloud datacenters may have reliable enough batteries to allow recommending this as the default (and I've heard murmurings that some current Google datacenters have reliable batteries), but right now the recommendation should be that you have to know your hardware in order to specify nobarrier
and the cloud providers do not provide enough transparency to do this safely.
v2.0/performance-tuning.md, line 141 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
OK. Thanks, all. I'll remove this. |
2cfba8d
to
9356ba4
Compare
Roachprod commands for single-region portion:
Roachprod commands for multi-region portion:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
_includes/v2.0/performance/tuning.py, line 11 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Add
type=int
here, and then you can remove theint()
aroundargs.repeat
below.
Done.
_includes/v2.0/performance/tuning.py, line 29 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Just use
n
here; you don't need a separatecount
variable.
Done.
_includes/v2.0/performance/tuning.py, line 39 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Do you really need to catch all exceptions here? I would just remove the try/except block, or if you really want to catch and ignore some exceptions, use a narrower
except
clause.
This is just in place to prevent exceptions for writes, where there is no colnames
or rows
to print, e.g.:
python tuning.py --host=10.142.0.42 --statement="INSERT INTO users VALUES (gen_random_uuid(), 'new york', 'Max Roach', '411 Drum Street', '173635282937347')" --repeat=100 --times --cumulative
Traceback (most recent call last):
File "tuning.py", line 29, in <module>
colnames = [desc[0] for desc in cur.description]
TypeError: 'NoneType' object is not iterable
Is there a better way?
_includes/v2.0/performance/tuning.py, line 41 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Since we're mostly dealing with very small values, the output would be more reasonable if this were
times.append((end - start) * 1000)
and change "seconds" to "milliseconds" everywhere.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
_includes/v2.0/performance/tuning.py, line 39 at r2 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
This is just in place to prevent exceptions for writes, where there is no
colnames
orrows
to print, e.g.:python tuning.py --host=10.142.0.42 --statement="INSERT INTO users VALUES (gen_random_uuid(), 'new york', 'Max Roach', '411 Drum Street', '173635282937347')" --repeat=100 --times --cumulative
Traceback (most recent call last): File "tuning.py", line 29, in <module> colnames = [desc[0] for desc in cur.description] TypeError: 'NoneType' object is not iterable
Is there a better way?
In that case you can guard it all like this:
if cur.description is not None:
colnames = [desc[0] for desc in cur.description]
6e2ce87
to
ffba014
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
_includes/v2.0/performance/tuning.py, line 1 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Add
#!/usr/bin/env python
as the first line of this script. (orpython3
. I think this script will work as-is on python 3, and that's the version that is preinstalled on current ubuntu).One specific benefit of python 3 for this script is the new
statistics
module which would make it easy to print the standard deviation and median (not just the mean) without installing more packages. The stddev would be especially useful for the partitioning examples.
I'm going with ubuntu 16.04, since that's what we use internally. Seems like it's still python2 on that image, so I'll stick with that for now.
_includes/v2.0/performance/tuning.py, line 39 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
In that case you can guard it all like this:
if cur.description is not None: colnames = [desc[0] for desc in cur.description]
Done.
v2.0/performance-tuning.md, line 30 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I don't see any mention of which image to use here. (although the use of
apt
below implies the use of something in the debian/ubuntu family instead of fedora/redhat). We should specify the base image here (presumably ubuntu 18.04)
Done. Again, going with ubuntu 16.04, since that's what we use internally.
v2.0/performance-tuning.md, line 31 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Instead of "read/write testing", I'd call this "For running the client application workload".
Done.
v2.0/performance-tuning.md, line 128 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
s/admin/web/
Done.
v2.0/performance-tuning.md, line 169 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
We know this will give a permissions error on the VM setup we're recommending, so just put
sudo
in the command already.
Done.
v2.0/performance-tuning.md, line 269 at r2 (raw file):
Previously, knz (kena) wrote…
ensure a consistent use of tabs and spaces in this example.
Done.
v2.0/performance-tuning.md, line 406 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
If this is the only external package you need, it might be better to just install it with apt-get instead of pip (using pip without virtualenv is prone to trouble). It also saves a step:
apt-get install python-psycopg2
instead ofapt-get install python-pip && pip install psycopg2-binary
.Or, as noted above, ubuntu 18.04 doesn't install python 2 by default, so it would be installed here. We could use python 3 instead (which will download and install less stuff here) with
apt-get install python3-psyopg2
Done.
Going with ubuntu 16.04 since that's what we use internally.
v2.0/performance-tuning.md, line 450 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Optional: if you add
chmod +x tuning.py
after downloading it, the invocations can just be./tuning.py
instead ofpython tuning.py
. This lets you avoid repeating the version of python used.
Done.
ffba014
to
66f14f8
Compare
|
66f14f8
to
85c38b1
Compare
Thanks, @robert-s-lee. I've changed compound to composite (based on this wikipedia entry, I was just using the wrong term), and I've included more details about how the leaseholder mechanism bypasses Raft. I'm not sure how to proceed on the other points. We can talk about them in person. |
@robert-s-lee, I expanded "Important concepts" to cover more about the Raft log and how it plays into writes, and I added a brief emphasis of network and disk i/o as performance bottlenecks. I don't think we can go into greater details in this tutorial without making the material much harder to parse. But I opened a docs issue to document the read and write paths in more detail separately. PTAL. |
2a85b3b
to
0405709
Compare
0405709
to
87c1f10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy moly, this is incredible! Such an epic undertaking handled so deftly.
I left a small number of small comments, though I have two larger/structural things I'd consider.
Modularization
This is an epic guide, which might be daunting for some people to delve all the way into. If there's a way in which we could break the content out into smaller, more targeted use cases, I wonder if it might be slightly easier to get people to engage with it?
One possible tack for this would be to convert the meat of the content into a number of includes, which you could then embed in both this guide, as well as some smaller ones. Given the structure of this guide, it seems like a possible treatment would be four smaller guides:
- Optimize Writes/Reads for Single-Region Deployments (i.e. 2 guides; one for reads, one for writes)
- Optimize Writes/Reads for Multi-Region Deployments (ditto)
That being said, this suggestion is more intuition and idle speculation.
Making Promises Up Front
You did so much work to gather this information and substantiate the claims we make. I'd love to see more of this info surfaced higher in the document and made more of a headline, e.g.
"Using the procedures outlined in this guide we improved:
- Reads in a single region by X%
- Writes in a single region by Y%
- Reads in multiple regions by Z%
- Writes in multiple regions by W%"
Obviously you'd need a caveat blah blah, but I think that expressing the value of this guidance with numbers could make it much more exciting.
That all being said, this is still such an impressive guide. Bravo.
Reviewable status:
complete! 0 of 0 LGTMs obtained
images/v2.0/perf_tuning_movr_schema.png, line 0 at r3 (raw file):
I would suggest changing the order of id
and city
in these tables. That they're in the inverted order feels unintuitive.
v2.0/performance-tuning.md, line 35 at r3 (raw file):
### Schema You'll use sample schema and data for Cockroach Lab's fictional vehicle-sharing company, [MovR](https://github.com/cockroachdb/movr):
Having a brief description of what this schema represents might be helpful for creating a mental model of what these things represent.
v2.0/performance-tuning.md, line 216 at r3 (raw file):
~~~ 4. Start the [built-in SQL shell](use-the-built-in-sql-client.html), pointing it one of the CockroachDB nodes:
point it at one
v2.0/performance-tuning.md, line 344 at r3 (raw file):
Referencing columns | Referenced columns --------------------|------------------- `vehicles.city/vehicles.owner_id` | `users.city/users.id`
nit: I was expecting these slashes to be commas
v2.0/performance-tuning.md, line 535 at r3 (raw file):
#### Filtering by a secondary index To speed up this query, add a secondary index on `name`:
This feels like it ought to be run through cockroach
instead of tuning.py
; ditto for all of the DDL statements.
v2.0/performance-tuning.md, line 944 at r4 (raw file):
~~~ This tells us that the index is stored in 2 ranges, with the leaseholders for both of them on node 1. We already know that the leaseholder for the `users` table is on node 2.
It's unclear to me what "We already know that the leaseholder for the users
table is on..." is referring to.
v2.0/performance-tuning.md, line 1360 at r4 (raw file):
--> Given that Movr is active on both US coasts, you'll now scale the cluster into two new regions, us-west1-a and us-west2-a, each with 3 nodes and an extra instance for simulating regional client traffic.
us-west1-a & co. could maybe use some kind of consistent highlighting?
87c1f10
to
8b04a2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review, @sploiselle! I made all of the small changes, and I'll add the promises up front as you suggest soon. I also agree that this is way too big and overwhelming for most users. Since I've been working on this version for a long while already, I'll publish the epic now and revisit ways to make it more modular and digestible as a next step. I like your ideas, though.
Reviewable status:
complete! 0 of 0 LGTMs obtained
images/v2.0/perf_tuning_movr_schema.png, line at r3 (raw file):
Previously, sploiselle (Sean Loiselle) wrote…
I would suggest changing the order of
id
andcity
in these tables. That they're in the inverted order feels unintuitive.
I agree, and I wish we could! Because we're partitioning on city
later, in the multi-region phase, city
needs to be the first column in the primary key.
v2.0/performance-tuning.md, line 35 at r3 (raw file):
Previously, sploiselle (Sean Loiselle) wrote…
Having a brief description of what this schema represents might be helpful for creating a mental model of what these things represent.
Done.
v2.0/performance-tuning.md, line 216 at r3 (raw file):
Previously, sploiselle (Sean Loiselle) wrote…
point it at one
Done.
v2.0/performance-tuning.md, line 535 at r3 (raw file):
Previously, sploiselle (Sean Loiselle) wrote…
This feels like it ought to be run through
cockroach
instead oftuning.py
; ditto for all of the DDL statements.
Done.
v2.0/performance-tuning.md, line 944 at r4 (raw file):
Previously, sploiselle (Sean Loiselle) wrote…
It's unclear to me what "We already know that the leaseholder for the
users
table is on..." is referring to.
Yeah, it's a few paragraphs above. I've made this a bit more explicit, but it's probably still tricky to follow.
v2.0/performance-tuning.md, line 1360 at r4 (raw file):
Previously, sploiselle (Sean Loiselle) wrote…
us-west1-a & co. could maybe use some kind of consistent highlighting?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
images/v2.0/perf_tuning_movr_schema.png, line at r3 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
I agree, and I wish we could! Because we're partitioning on
city
later, in the multi-region phase,city
needs to be the first column in the primary key.
Would it be possible to just put city first in the diagram?
@jseldess - this is really incredible. I didn't run into any edits or nits worth picking. I second Sean's thoughts on the epic scale. One thought while I was reading through it: maybe a series of blog posts? In any case, agreed that publishing now and modularizing later is a good approach. |
8b04a2b
to
6e3f04d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR, @tim-o. Blog posts are certainly an option. In any case, I need to work on making the thing more digestible.
Reviewable status:
complete! 0 of 0 LGTMs obtained
images/v2.0/perf_tuning_movr_schema.png, line at r3 (raw file):
Previously, sploiselle (Sean Loiselle) wrote…
Would it be possible to just put city first in the diagram?
Yes. I'm out of time for today's release, though, so I'll make this a follow-up.
This PR adds a 2.0 performance tuning tutorial. It starts with a single-region deployment, focusing on common SQL techniques for getting faster reads and writes. It then expands into a multi-region deployment, focusing on table partitioning.
The PR also includes a simple Python client for read/write testing. It hopefully makes it easier to run a given statement multiple times and look at the average and/or cumulative latency.
For internal testers, I've added the corresponding
roachprod
commands as comments in theperformance-tuning.md
file and in a separate comment below.Fixes #3160.