sql: SQL schema changes are too slow for ORM tests to adopt #71800

otan · 2021-10-21T06:27:34Z

Have discussed this privately with @ajwerner, but filing a tracking issue to centralise discussions.

We have, in the past, had issues with ORMs needing a lot longer to run CI tests compared to Postgres.

Using prisma as an example, it takes over 5 hours to run 7000 tests in their test suite (it's a subset of their entire suite). Each test creates a database and a few tables, and a few indexes after the tables are created. Even after dropping the database after each run and setting a low gc.ttlseconds, there are still issues. Postgres/MySQL take <15mins which contains over 18000 tests. ActiveRecord is another example, 2.5 hours in CockroachDB, <15mins for Postgres.

Running Prisma Tests

jaegar

docker run -d --name jaeger -p 6831:6831/udp -p 16686:16686 jaegertracing/all-in-one:latest
open localhost:16686

CockroachDB

pull https://github.com/cockroachdb/cockroach/compare/master...otan-cockroach:fresh_prototype?expand=1
and make buildshort (or whatnot)
start Cockroach COCKROACH_JAEGER=localhost ./cockroach demo --insecure --empty --sql-port 5436 --max-sql-memory '8GiB'
run set cluster setting sql.defaults.default_int_size = 4; set cluster setting sql.defaults.serial_normalization = 'sql_sequence'; SET CLUSTER SETTING sql.defaults.propagate_input_ordering.enabled = false; SET CLUSTER SETTING schemachanger.backfiller.buffer_increment = '128 KiB'; alter range default configure zone using gc.ttlseconds = '10'; -- prisma

prisma

clone https://github.com/prisma/prisma-engines
pull https://github.com/prisma/prisma-engines/compare/master...otan-cockroach:prototype_branch?expand=1
in the dir then run:

echo 'cockroach' > current_connector
cp ./query-engine/connector-test-kit-rs/test-configs/cockroach .test_config

Run cargo test -p query-engine-tests - tests start off quick then slow down

Epic CRDB-10719

The text was updated successfully, but these errors were encountered:

otan · 2021-10-21T06:30:24Z

When looking at traces, there is heavy dominance on waiting for jobs to complete:

Unfortunately the job traces are not too useful:

rafiss · 2021-10-21T21:04:49Z

Last time there was a deep investigation into this problem (#47790) it led to this fix relating to jobs adoption: #48608

I wonder if the other ideas in #47790 are still worth exploring (asking as a complete noob).

ajwerner · 2021-10-21T21:19:24Z

I'm almost certain there will be low-hanging fruit here. The numbers here are still ~small. We've got a bunch of places where we're waiting for things and then we're waiting for things waiting for things and all of it is happening with exponential backoff. One tweak is to tune the exponential backoff of the polling loops to be faster. If that can help by even 50% it'd be a big win. I'm going to try to get this running.

ajwerner · 2021-10-22T23:10:44Z

Alright, I've spent a good bit of the day hacking on this. I've got some code changes that I think can get another 10-15% but I'm not sure they're nearly as important as some of these cluster settings to keep the jobs table garbage down and to keep the range count down. It's sort of disappointing when it works out that way, but alas. Also, the tests as they are fail after what I surmise to be about 1/7th. I do have one little patch that lets us merge ranges. I don't know how much it matters.

A very strange observations, using a real disk but with fsync's turned off seems faster than demo. I don't understand that one really at all.

With demo, i get:
test result: FAILED. 968 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 476.94s

Here's my reproducible result:

export COCKROACH_BINARY=cockroach-hacks
cat > setup.sql <<-EOF
SET CLUSTER SETTING "sql.defaults.default_int_size" = 4;
SET CLUSTER SETTING "sql.defaults.serial_normalization" = 'sql_sequence';
SET CLUSTER SETTING "sql.defaults.propagate_input_ordering.enabled" = false;
SET CLUSTER SETTING "schemachanger.backfiller.buffer_increment" = '128 KiB';
SET CLUSTER SETTING "kv.raft_log.disable_synchronization_unsafe" = true;
SET CLUSTER SETTING "jobs.registry.interval.gc" = '30s';
SET CLUSTER SETTING "kv.range_merge.queue_interval" = '50ms';
SET CLUSTER SETTING "jobs.retention_time" = '15s';
SET CLUSTER SETTING "jobs.registry.interval.cancel" = '180s';
SET CLUSTER SETTING "sql.stats.automatic_collection.enabled" = false;
ALTER RANGE default CONFIGURE ZONE USING "gc.ttlseconds" = '5';
ALTER DATABASE system CONFIGURE ZONE USING "gc.ttlseconds" = '5';
SET CLUSTER SETTING "kv.raft_log.disable_synchronization_unsafe" = true;
EOF
roachprod wipe local && \
  roachprod put local ./cockroach $COCKROACH_BINARY && \
  roachprod start local --binary $COCKROACH_BINARY  && \
  roachprod sql local < setup.sql && \
  roachprod stop local  && \
  roachprod start local --binary cockroach-hacks --args='--sql-addr=:5436' --skip-init

Followed by:

cargo test -p query-engine-tests

test result: FAILED. 968 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 188.80s

ajwerner · 2021-10-22T23:15:17Z

#71899 came out of this

ajwerner · 2021-10-23T00:25:59Z

Okay, right, so you add the hack to avoid gossiping the system config span into the mix and things start to look pretty good
SET CLUSTER SETTING "sql.catalog.unsafe_skip_system_config_trigger" = true;

test result: FAILED. 968 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 44.77s
test result: FAILED. 968 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 46.84s

otan · 2021-10-23T09:06:58Z

nice! i wonder if we can apply this to our activerecord tests and see if they take < 2.5 hours

otan · 2021-10-24T20:56:20Z

Also, the tests as they are fail after what I surmise to be about 1/7th

what do you mean by this? yeah there are a few failing tests remaining, i'll get to them eventually.

ajwerner · 2021-10-24T22:29:05Z

I just meant that the test run is like 1/7th of the 7000 number you quoted above. Want to tell me how to run them all?

otan · 2021-10-24T23:28:56Z

oh! revert prisma/prisma-engines@2a4e991 to run all 18000

run SIMPLE_TEST_MODE=on cargo test -p query-engine-tests for the ~7k set

otan · 2021-10-25T01:43:10Z

Down to 75mins from 150mins with some of the cluster settings mentioned, as per cockroachdb/activerecord-cockroachdb-adapter#233.

ajwerner · 2021-10-25T23:13:59Z

With one more commit that is not ready to be seen, this is looking pretty reasonable. It's not <15m, but it's <30m. This was my laptop. It's totally CPU bound. My guess is the next best thing would be to work on the lease draining propagation. I doubt there's more than 10% to get out of that. I took some profiles. The only "smoking gun" is 20% in findrunnable, which might have to do with some of the busy waiting and spinning we do in a bunch of places.

test result: FAILED. 18420 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1706.80s

I'll run it again as I go to dinner to see what it looks like without that fancy jobs change.

ajwerner · 2021-10-26T13:13:15Z

Okay, seems like this jobs change matters.

otan · 2021-10-27T05:24:10Z

is it possible to get a branch with your intermediate fixes @ajwerner ? i can make the prisma suite run against a custom build for now to get the ball rolling and have it be properly regression tested on their side.

vy-ton · 2021-12-01T17:10:20Z

@otan for when you're back, I'd like to understand why we needed to set sql.defaults.propagate_input_ordering.enabled. This is relevant to Queries since we introduced this setting to help proceed with Graphile compatibility. FYI @rytaft

otan · 2021-12-01T19:21:13Z

We don't. I think I removed it In later iterations

ajwerner · 2022-01-06T05:22:15Z

We've done a bunch of work here. I'm closing this.

otan added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-schema-changes A-tools-prisma labels Oct 21, 2021

blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Oct 21, 2021

otan changed the title ~~sql: SQL schema changes are too slow for ORM tests to adopt~~ sql: SQL schema changes are too slow that makes it hard for ORM tests to adopt Oct 21, 2021

otan changed the title ~~sql: SQL schema changes are too slow that makes it hard for ORM tests to adopt~~ sql: SQL schema changes slow which makes it hard for ORM tests to adopt Oct 21, 2021

otan changed the title ~~sql: SQL schema changes slow which makes it hard for ORM tests to adopt~~ sql: SQL schema changes are too slow for ORM tests to adopt Oct 21, 2021

rafiss added A-tools-activerecord A-tools-hibernate Issues that pertain to Hibernate integration. labels Oct 22, 2021

otan mentioned this issue Oct 24, 2021

build: try faster cluster settings for CI cockroachdb/activerecord-cockroachdb-adapter#233

Closed

ajwerner mentioned this issue Oct 25, 2021

jobs: don't block on notify #71909

Merged

ajwerner mentioned this issue Nov 1, 2021

jobs: add mechanism to communicate job completion in-memory locally #72297

Merged

ajwerner closed this as completed Jan 6, 2022

kocoten1992 mentioned this issue Feb 15, 2023

drop table and add/drop column still slow despite effort in #71800 #97195

Closed

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: SQL schema changes are too slow for ORM tests to adopt #71800

sql: SQL schema changes are too slow for ORM tests to adopt #71800

otan commented Oct 21, 2021 •

edited by rafiss

Loading

otan commented Oct 21, 2021

rafiss commented Oct 21, 2021

ajwerner commented Oct 21, 2021

ajwerner commented Oct 22, 2021

ajwerner commented Oct 22, 2021

ajwerner commented Oct 23, 2021

otan commented Oct 23, 2021

otan commented Oct 24, 2021

ajwerner commented Oct 24, 2021

otan commented Oct 24, 2021

otan commented Oct 25, 2021

ajwerner commented Oct 25, 2021

ajwerner commented Oct 26, 2021

otan commented Oct 27, 2021

vy-ton commented Dec 1, 2021

otan commented Dec 1, 2021

ajwerner commented Jan 6, 2022

sql: SQL schema changes are too slow for ORM tests to adopt #71800

sql: SQL schema changes are too slow for ORM tests to adopt #71800

Comments

otan commented Oct 21, 2021 • edited by rafiss Loading

Running Prisma Tests

jaegar

CockroachDB

prisma

otan commented Oct 21, 2021

rafiss commented Oct 21, 2021

ajwerner commented Oct 21, 2021

ajwerner commented Oct 22, 2021

ajwerner commented Oct 22, 2021

ajwerner commented Oct 23, 2021

otan commented Oct 23, 2021

otan commented Oct 24, 2021

ajwerner commented Oct 24, 2021

otan commented Oct 24, 2021

otan commented Oct 25, 2021

ajwerner commented Oct 25, 2021

ajwerner commented Oct 26, 2021

otan commented Oct 27, 2021

vy-ton commented Dec 1, 2021

otan commented Dec 1, 2021

ajwerner commented Jan 6, 2022

otan commented Oct 21, 2021 •

edited by rafiss

Loading