-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: foreign key batch checking per statement #26786
Comments
See #15157 if you haven't already, which has some prior discussion about this issue as well as some potential other strategies for dealing with it. |
One thing to note, is that when batching together for each statement, make sure that moving them to batching per transaction or even reverting to per row is possible. That way |
Oh, I missed this issue when creating #26795. Feel free to incorporate what you like into this one and close it. |
Please be mindful in these discussions that there may be too much work to defer: accumulating the work to be done may not fit in RAM.
That's why we have maximum batch sizes in many places.
As a matter of methodology any proposal should model memory usage as a function of the number of rows processed. There should be an upper bound that's independent from the number of rows.
(There is an exception which we may want to support later but **not in the general case and certainly not to enable optimizations**: the case where the client *mandates* deferred key checks in SQL. For those cases we must do clever buffering of the work, using a mix of RAM and disk storage so that RAM usage is properly bounded at all times. But beware that supporting *mandatory* deferred checks is fundamentally a different project so don't let this avenue of thought distract you / lose yourself on a tangent)
Tobias Schottdorf <[email protected]> schreef op 18 juni 2018 14:58:33 GMT-04:00:
…Oh, I missed this issue when creating
#26795. Feel free to
incorporate what you like into this one and close it.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#26786 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Bram my previous response was directed to you. I think:
1) it's not a good idea to explore inter-statement optimizations at this time (it's more complex and we must ensure the first next step is tractable)
2) it's semantically incorrect to mix deferred FK checks mandated by the SQL client with optimization-driven batching of FK checks per statement that requires (in the SQL language) non-deferred semantics. The proof of why this is true is left as an exercise to the reader but I can provide some guidance upon request.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
@knz, I agree. And of course we need to be cognizant of the missing functionality. Regardless, we need to ensure that we never run out of memory. |
The work on this did not make it into 2.1, but should be done in the 2.2 time frame. |
I just spent some time looking at traces from |
Please send this to Jordan and Andy to further motivate prioritization in 19.2
Nathan VanBenschoten <[email protected]> schreef op 29 maart 2019 06:07:47 CET:
…I just spent some time looking at traces from `new_order` transactions
in TPC-C. On a cluster with a low amount of load, the transaction takes
around `31ms`. The traces revealed that, on average, about `9ms` of the
transaction are spent performing redundant foreign key lookups that
could be eliminated if we collapsed foreign key checks across rows in
the same statement. Put another way, 9 of the 23 kv batches issued by
the transaction were superfluous and could be completely avoided by
addressing this issue. That's an estimated 27% savings on the most
common txn in TPC-C.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#26786 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
cc @awoods187 please refer to nathan's comment above |
It's on the roadmap |
@awoods187 I was more thinking about copy-pasting his explanation into airtable, given that the initial motivation was one of correctness and use cases, and here we have a new argument that's performance-oriented. |
This will get done as part of the optimizer's work on foreign key planning, so I'm moving to planning project for tracking. |
Opt-driven FK checks are now enabled on master. |
Currently, foreign key checks are run for every row being inserted. We could make optimizations by making foreign key checks at the end of each statement, batching together all of the checks corresponding to a single statement.
@BramGruneir @knz @jordanlewis
"if you insert 10 rows in a single statement, 8 of which have the same value for a column that has a foreign key relationship to another table, you shouldn’t have to check that relationship 8 times"
fk.go
do the foreign key checks, and possibly how those functions are calledThe text was updated successfully, but these errors were encountered: