-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: SQL schema changes are too slow for ORM tests to adopt #71800
Comments
I'm almost certain there will be low-hanging fruit here. The numbers here are still ~small. We've got a bunch of places where we're waiting for things and then we're waiting for things waiting for things and all of it is happening with exponential backoff. One tweak is to tune the exponential backoff of the polling loops to be faster. If that can help by even 50% it'd be a big win. I'm going to try to get this running. |
Alright, I've spent a good bit of the day hacking on this. I've got some code changes that I think can get another 10-15% but I'm not sure they're nearly as important as some of these cluster settings to keep the jobs table garbage down and to keep the range count down. It's sort of disappointing when it works out that way, but alas. Also, the tests as they are fail after what I surmise to be about 1/7th. I do have one little patch that lets us merge ranges. I don't know how much it matters. A very strange observations, using a real disk but with fsync's turned off seems faster than demo. I don't understand that one really at all. With demo, i get: Here's my reproducible result: export COCKROACH_BINARY=cockroach-hacks
cat > setup.sql <<-EOF
SET CLUSTER SETTING "sql.defaults.default_int_size" = 4;
SET CLUSTER SETTING "sql.defaults.serial_normalization" = 'sql_sequence';
SET CLUSTER SETTING "sql.defaults.propagate_input_ordering.enabled" = false;
SET CLUSTER SETTING "schemachanger.backfiller.buffer_increment" = '128 KiB';
SET CLUSTER SETTING "kv.raft_log.disable_synchronization_unsafe" = true;
SET CLUSTER SETTING "jobs.registry.interval.gc" = '30s';
SET CLUSTER SETTING "kv.range_merge.queue_interval" = '50ms';
SET CLUSTER SETTING "jobs.retention_time" = '15s';
SET CLUSTER SETTING "jobs.registry.interval.cancel" = '180s';
SET CLUSTER SETTING "sql.stats.automatic_collection.enabled" = false;
ALTER RANGE default CONFIGURE ZONE USING "gc.ttlseconds" = '5';
ALTER DATABASE system CONFIGURE ZONE USING "gc.ttlseconds" = '5';
SET CLUSTER SETTING "kv.raft_log.disable_synchronization_unsafe" = true;
EOF
roachprod wipe local && \
roachprod put local ./cockroach $COCKROACH_BINARY && \
roachprod start local --binary $COCKROACH_BINARY && \
roachprod sql local < setup.sql && \
roachprod stop local && \
roachprod start local --binary cockroach-hacks --args='--sql-addr=:5436' --skip-init Followed by: cargo test -p query-engine-tests
|
#71899 came out of this |
Okay, right, so you add the hack to avoid gossiping the system config span into the mix and things start to look pretty good
|
nice! i wonder if we can apply this to our activerecord tests and see if they take < 2.5 hours |
what do you mean by this? yeah there are a few failing tests remaining, i'll get to them eventually. |
I just meant that the test run is like 1/7th of the 7000 number you quoted above. Want to tell me how to run them all? |
oh! revert prisma/prisma-engines@2a4e991 to run all 18000 run |
Down to 75mins from 150mins with some of the cluster settings mentioned, as per cockroachdb/activerecord-cockroachdb-adapter#233. |
With one more commit that is not ready to be seen, this is looking pretty reasonable. It's not <15m, but it's <30m. This was my laptop. It's totally CPU bound. My guess is the next best thing would be to work on the lease draining propagation. I doubt there's more than 10% to get out of that. I took some profiles. The only "smoking gun" is 20% in
I'll run it again as I go to dinner to see what it looks like without that fancy jobs change. |
Okay, seems like this jobs change matters. |
is it possible to get a branch with your intermediate fixes @ajwerner ? i can make the prisma suite run against a custom build for now to get the ball rolling and have it be properly regression tested on their side. |
We don't. I think I removed it In later iterations |
We've done a bunch of work here. I'm closing this. |
Have discussed this privately with @ajwerner, but filing a tracking issue to centralise discussions.
We have, in the past, had issues with ORMs needing a lot longer to run CI tests compared to Postgres.
Using prisma as an example, it takes over 5 hours to run 7000 tests in their test suite (it's a subset of their entire suite). Each test creates a database and a few tables, and a few indexes after the tables are created. Even after dropping the database after each run and setting a low gc.ttlseconds, there are still issues. Postgres/MySQL take <15mins which contains over 18000 tests. ActiveRecord is another example, 2.5 hours in CockroachDB, <15mins for Postgres.
Running Prisma Tests
jaegar
docker run -d --name jaeger -p 6831:6831/udp -p 16686:16686 jaegertracing/all-in-one:latest
localhost:16686
CockroachDB
and
make buildshort (or whatnot)
COCKROACH_JAEGER=localhost ./cockroach demo --insecure --empty --sql-port 5436 --max-sql-memory '8GiB'
set cluster setting sql.defaults.default_int_size = 4; set cluster setting sql.defaults.serial_normalization = 'sql_sequence'; SET CLUSTER SETTING sql.defaults.propagate_input_ordering.enabled = false; SET CLUSTER SETTING schemachanger.backfiller.buffer_increment = '128 KiB'; alter range default configure zone using gc.ttlseconds = '10'; -- prisma
prisma
in the dir then run:
cargo test -p query-engine-tests
- tests start off quick then slow downEpic CRDB-10719
The text was updated successfully, but these errors were encountered: