Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: SQL schema changes are too slow for ORM tests to adopt #71800

Closed
otan opened this issue Oct 21, 2021 · 17 comments
Closed

sql: SQL schema changes are too slow for ORM tests to adopt #71800

otan opened this issue Oct 21, 2021 · 17 comments
Labels
A-schema-changes A-tools-activerecord A-tools-hibernate Issues that pertain to Hibernate integration. A-tools-prisma C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@otan
Copy link
Contributor

otan commented Oct 21, 2021

Have discussed this privately with @ajwerner, but filing a tracking issue to centralise discussions.

We have, in the past, had issues with ORMs needing a lot longer to run CI tests compared to Postgres.

Using prisma as an example, it takes over 5 hours to run 7000 tests in their test suite (it's a subset of their entire suite). Each test creates a database and a few tables, and a few indexes after the tables are created. Even after dropping the database after each run and setting a low gc.ttlseconds, there are still issues. Postgres/MySQL take <15mins which contains over 18000 tests. ActiveRecord is another example, 2.5 hours in CockroachDB, <15mins for Postgres.


Running Prisma Tests

jaegar

  • docker run -d --name jaeger -p 6831:6831/udp -p 16686:16686 jaegertracing/all-in-one:latest
  • open localhost:16686

CockroachDB

  • pull https://github.com/cockroachdb/cockroach/compare/master...otan-cockroach:fresh_prototype?expand=1
    and make buildshort (or whatnot)
  • start Cockroach COCKROACH_JAEGER=localhost ./cockroach demo --insecure --empty --sql-port 5436 --max-sql-memory '8GiB'
  • run set cluster setting sql.defaults.default_int_size = 4; set cluster setting sql.defaults.serial_normalization = 'sql_sequence'; SET CLUSTER SETTING sql.defaults.propagate_input_ordering.enabled = false; SET CLUSTER SETTING schemachanger.backfiller.buffer_increment = '128 KiB'; alter range default configure zone using gc.ttlseconds = '10'; -- prisma

prisma

echo 'cockroach' > current_connector
cp ./query-engine/connector-test-kit-rs/test-configs/cockroach .test_config
  • Run cargo test -p query-engine-tests - tests start off quick then slow down

Epic CRDB-10719

@otan otan added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-schema-changes A-tools-prisma labels Oct 21, 2021
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Oct 21, 2021
@otan
Copy link
Contributor Author

otan commented Oct 21, 2021

When looking at traces, there is heavy dominance on waiting for jobs to complete:

image

Unfortunately the job traces are not too useful:

image

@otan otan changed the title sql: SQL schema changes are too slow for ORM tests to adopt sql: SQL schema changes are too slow that makes it hard for ORM tests to adopt Oct 21, 2021
@otan otan changed the title sql: SQL schema changes are too slow that makes it hard for ORM tests to adopt sql: SQL schema changes slow which makes it hard for ORM tests to adopt Oct 21, 2021
@otan otan changed the title sql: SQL schema changes slow which makes it hard for ORM tests to adopt sql: SQL schema changes are too slow for ORM tests to adopt Oct 21, 2021
@rafiss
Copy link
Collaborator

rafiss commented Oct 21, 2021

Last time there was a deep investigation into this problem (#47790) it led to this fix relating to jobs adoption: #48608

I wonder if the other ideas in #47790 are still worth exploring (asking as a complete noob).

@ajwerner
Copy link
Contributor

I'm almost certain there will be low-hanging fruit here. The numbers here are still ~small. We've got a bunch of places where we're waiting for things and then we're waiting for things waiting for things and all of it is happening with exponential backoff. One tweak is to tune the exponential backoff of the polling loops to be faster. If that can help by even 50% it'd be a big win. I'm going to try to get this running.

@rafiss rafiss added A-tools-activerecord A-tools-hibernate Issues that pertain to Hibernate integration. labels Oct 22, 2021
@ajwerner
Copy link
Contributor

Alright, I've spent a good bit of the day hacking on this. I've got some code changes that I think can get another 10-15% but I'm not sure they're nearly as important as some of these cluster settings to keep the jobs table garbage down and to keep the range count down. It's sort of disappointing when it works out that way, but alas. Also, the tests as they are fail after what I surmise to be about 1/7th. I do have one little patch that lets us merge ranges. I don't know how much it matters.

A very strange observations, using a real disk but with fsync's turned off seems faster than demo. I don't understand that one really at all.

With demo, i get:
test result: FAILED. 968 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 476.94s


Here's my reproducible result:

export COCKROACH_BINARY=cockroach-hacks
cat > setup.sql <<-EOF
SET CLUSTER SETTING "sql.defaults.default_int_size" = 4;
SET CLUSTER SETTING "sql.defaults.serial_normalization" = 'sql_sequence';
SET CLUSTER SETTING "sql.defaults.propagate_input_ordering.enabled" = false;
SET CLUSTER SETTING "schemachanger.backfiller.buffer_increment" = '128 KiB';
SET CLUSTER SETTING "kv.raft_log.disable_synchronization_unsafe" = true;
SET CLUSTER SETTING "jobs.registry.interval.gc" = '30s';
SET CLUSTER SETTING "kv.range_merge.queue_interval" = '50ms';
SET CLUSTER SETTING "jobs.retention_time" = '15s';
SET CLUSTER SETTING "jobs.registry.interval.cancel" = '180s';
SET CLUSTER SETTING "sql.stats.automatic_collection.enabled" = false;
ALTER RANGE default CONFIGURE ZONE USING "gc.ttlseconds" = '5';
ALTER DATABASE system CONFIGURE ZONE USING "gc.ttlseconds" = '5';
SET CLUSTER SETTING "kv.raft_log.disable_synchronization_unsafe" = true;
EOF
roachprod wipe local && \
  roachprod put local ./cockroach $COCKROACH_BINARY && \
  roachprod start local --binary $COCKROACH_BINARY  && \
  roachprod sql local < setup.sql && \
  roachprod stop local  && \
  roachprod start local --binary cockroach-hacks --args='--sql-addr=:5436' --skip-init

Followed by:

cargo test -p query-engine-tests 

test result: FAILED. 968 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 188.80s

@ajwerner
Copy link
Contributor

#71899 came out of this

@ajwerner
Copy link
Contributor

Okay, right, so you add the hack to avoid gossiping the system config span into the mix and things start to look pretty good
SET CLUSTER SETTING "sql.catalog.unsafe_skip_system_config_trigger" = true;

test result: FAILED. 968 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 44.77s
test result: FAILED. 968 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 46.84s

@otan
Copy link
Contributor Author

otan commented Oct 23, 2021

nice! i wonder if we can apply this to our activerecord tests and see if they take < 2.5 hours

@otan
Copy link
Contributor Author

otan commented Oct 24, 2021

Also, the tests as they are fail after what I surmise to be about 1/7th

what do you mean by this? yeah there are a few failing tests remaining, i'll get to them eventually.

@ajwerner
Copy link
Contributor

I just meant that the test run is like 1/7th of the 7000 number you quoted above. Want to tell me how to run them all?

@otan
Copy link
Contributor Author

otan commented Oct 24, 2021

oh! revert prisma/prisma-engines@2a4e991 to run all 18000

run SIMPLE_TEST_MODE=on cargo test -p query-engine-tests for the ~7k set

@otan
Copy link
Contributor Author

otan commented Oct 25, 2021

Down to 75mins from 150mins with some of the cluster settings mentioned, as per cockroachdb/activerecord-cockroachdb-adapter#233.

@ajwerner
Copy link
Contributor

With one more commit that is not ready to be seen, this is looking pretty reasonable. It's not <15m, but it's <30m. This was my laptop. It's totally CPU bound. My guess is the next best thing would be to work on the lease draining propagation. I doubt there's more than 10% to get out of that. I took some profiles. The only "smoking gun" is 20% in findrunnable, which might have to do with some of the busy waiting and spinning we do in a bunch of places.

test result: FAILED. 18420 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1706.80s

I'll run it again as I go to dinner to see what it looks like without that fancy jobs change.

@ajwerner
Copy link
Contributor

Okay, seems like this jobs change matters.

@otan
Copy link
Contributor Author

otan commented Oct 27, 2021

is it possible to get a branch with your intermediate fixes @ajwerner ? i can make the prisma suite run against a custom build for now to get the ball rolling and have it be properly regression tested on their side.

@vy-ton
Copy link
Contributor

vy-ton commented Dec 1, 2021

@otan for when you're back, I'd like to understand why we needed to set sql.defaults.propagate_input_ordering.enabled. This is relevant to Queries since we introduced this setting to help proceed with Graphile compatibility. FYI @rytaft

@otan
Copy link
Contributor Author

otan commented Dec 1, 2021

We don't. I think I removed it In later iterations

@ajwerner
Copy link
Contributor

ajwerner commented Jan 6, 2022

We've done a bunch of work here. I'm closing this.

@ajwerner ajwerner closed this as completed Jan 6, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes A-tools-activerecord A-tools-hibernate Issues that pertain to Hibernate integration. A-tools-prisma C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

4 participants