-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsql: tracking issue for queries we expect to run through DistSQL #14288
Comments
Ran
on a 3 node local cluster, using the TPC-H scalefactor=1 dataset.
|
Ran
on a 3 node local cluster, using the TPC-H scalefactor=1 dataset.
|
Hm, that's much slower than I'd have expected. Isn't DistSQL able to reduce network traffic on these queries to O(1)? |
Yes, the network traffic is O(1) in these plans. Note that these are local clusters (all nodes on the same machine) so network traffic is not really network traffic. |
Ah, understood.
…On Thu, Mar 30, 2017 at 3:18 PM, RaduBerinde ***@***.***> wrote:
Yes, the network traffic is O(1) in these plans. Note that these are
*local* clusters (all nodes on the same machine) so network traffic is
not really network traffic.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#14288 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABdsPPXaHXVWMN34eLKNY2q0emX5eUEjks5rq__rgaJpZM4Mj8Zu>
.
|
@danhhz mentioned a |
Makes much more sense, thanks.
…On Thu, Mar 30, 2017 at 3:20 PM, RaduBerinde ***@***.***> wrote:
@danhhz <https://github.com/danhhz> mentioned a SELECT COUNT(*)
experiment on a real 3 node cluster (I think it was ~200M rows, SELECT
COUNT(*) took 7m33s compared to over an hour).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#14288 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABdsPC0LSslegjSGJTl7havxxxJy5jWVks5rrACNgaJpZM4Mj8Zu>
.
|
Indeed, it was the production lapis cluster |
Ran
on 6-node
|
Ran
using f97a5c3 on 6-node
|
Can you report the cockroach sha you used for future reference? |
@asubiotto The ranges for the |
Is there a quick way to find this information out? Last time I asked, there wasn't a clean way to figure out, given a table, what ranges the table was spread across. |
Admin UI says there are 74 ranges for table lineitem (but I can't figure out how to find range IDs or node information for those ranges), so it being spread across just 4 nodes seems strange (but possible). |
|
Thank you @petermattis. It appears there are replicas and lease holders on every node, so let me investigate. |
I updated the execution plan. I think the gateway node probably had a cache that hadn't been updated. The query plan now shows only 5 TableReaders which is still weird. |
Actually, 1 doesn't seem to be a lease holder for any range in lineitem. |
Sure, but it was showing 4 before, and that was certainly incorrect. cc @andreimatei. |
The range-descriptor and leaseholder caches can be empty or stale. This explains it, right? The state of the caches is supposed to be seen in the |
All these queries were run using 4129fe0 on 6-node I started by trying to run
On 6million ~2.5GiB I then moved down to
The query runs correctly, the execution plan looks good (note that nodes
I also ran:
To avoid running out of memory on the gateway node (execution plan here). The times for one run are as follows:
|
Queries run with
|
I forgot to include the execution plan, which looks good. |
Queries run with This query was constructed to have a sparse where clause, and really nothing else. The
The execution plan looks good The results are correct:
I did not add an |
Queries run with The query is The DistSQL execution plan is intimidating, but ultimately correct.
When run without the limit, both versions OOM:
While that is unsatisfactory and needs work, the point of this query demonstrates that in both time and memory usage, |
Query run using 630757c on 6-node
The query runs correctly, the execution plan looks good, and the times for one run are below:
|
@asubiotto Almost linear speedup. Nice! |
Queries run using 630757c on 6-node I ran a variety of join queries, but not documenting all of them, since they all have the same story: we always plan HashJoins with full bisection flows on all nodes that have a TableReader for that query. Sadly, this means we are very susceptible to running out of memory, which we still do on large datasets (and sometimes kill nodes as well since the memory accounting guardrails are not in 630757c). Here is one sample execution plan: as you can see, the planner is planning HashJoins and doing full bisection flows between all the nodes.
There is speedup, so the HashJoin, while not the best possible plan for this query, is still a hefty speedup over local execution. |
Closing this issue as we have now empirically evaluated and learned the breadth and limits of our DistSQL processors and the planning. All the credit to @asubiotto, who shepherded this through all those OOMs! 🎉 |
Spun up an azworker with the same specs as navy and ran all of these queries against postgres (TPC-H scalefactor 1). These numbers are from one run only. Note that the single-node and distributed SQL numbers are from the runs above (copy-pasted for convenience) from 6 node clusters (only the first query was run on a 3 node local cluster).
cc @petermattis @arjunravinarayan |
Thanks, @asubiotto. This will definitely motivate work in 1.1. |
This is a TODO list for listing queries that we expect to run using DistSQL (in auto mode) by 1.0. Feel free to add as needed, cc @andreimatei @RaduBerinde @cuongdo @asubiotto.
SELECT COUNT(*)
should always run through DistSQL for speed reasons. It is a common operation just after loading a bunch of data. It is currently exceedingly slow on large tables (without DistSQL). @arjunravinarayanSELECT COUNT (DISTINCT) column_name
@arjunravinarayanWHERE
clause. @arjunravinarayanLIMIT
queries, particularly limits afterJOIN
s that would cause large amounts of state to stream across machines. @arjunravinarayanIf there are any queries that we want turned on by 1.0, please add them to the list above.
Do not check off an item as done without adding a comment/issue tracking the queries actually attempted on a cluster. Try and report running times and
EXPLAIN(query)
output to show the DistSQL plan.The text was updated successfully, but these errors were encountered: