Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Restructure codebase to be more monorepo-friendly #1094

Merged
merged 25 commits into from
Apr 14, 2023
Merged

Restructure codebase to be more monorepo-friendly #1094

merged 25 commits into from
Apr 14, 2023

Conversation

dhruvkb
Copy link
Member

@dhruvkb dhruvkb commented Apr 7, 2023

This PR prepares the WordPress/openverse-catalog repository to be merged into the WordPress/openverse repository.

Making it a PR helps verify that everything continues to work (like tests and CI) and also make the changes easier to review.

@dhruvkb dhruvkb added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 🤖 aspect: dx Concerns developers' experience with the codebase 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Apr 7, 2023
@dhruvkb dhruvkb requested a review from a team as a code owner April 7, 2023 00:50
@dhruvkb dhruvkb requested review from AetherUnbound and stacimc April 7, 2023 00:50
@dhruvkb dhruvkb marked this pull request as draft April 7, 2023 00:50
@dhruvkb dhruvkb force-pushed the monorepo_prep branch 3 times, most recently from 4f3a2ae to 53c1ea3 Compare April 7, 2023 01:34
@dhruvkb dhruvkb marked this pull request as ready for review April 7, 2023 15:26
This was causing an issue where CLI parameters to pytest (e.g. `just catalog/test --extended`) were simply ignored because the parent "bash -c '<foo>'" command didn't end up having the final quotes. The change to the `run` recipe here should address that issue.
@AetherUnbound
Copy link
Contributor

@dhruvkb I believe I've addressed the issues with the tests! We should now see the --extended flag respected on CI as well, which we can check by making sure not tests are skipped. I want to point out, specifically, the change I made in c833356 - after lots of debugging, I noticed that the quotes around bash -c 'pytest {{ args {{' were actually disappearing when the docker-compose run command was finally called. Here's an example before the change:

j catalog/test --extended
just _mount-test "bash -c \'pytest --extended\'"
env COMPOSE_PROFILES="catalog_dependencies" just ../up 
env COMPOSE_PROFILES="catalog_dependencies" docker-compose -f docker-compose.yml up -d
Recreating openverse-catalog_s3_1 ... 
Recreating openverse-catalog_s3_1 ... done
Recreating openverse-catalog_load_to_s3_1 ... done
just ../run --rm -e AIRFLOW_VAR_INGESTION_LIMIT=1000000 -w /opt/airflow/catalog --volume /home/aether/git-a8c/openverse-catalog/catalog/../docker:/opt/airflow/docker/ scheduler bash -c \'pytest --extended\'
just dc run -u airflow  --rm -e AIRFLOW_VAR_INGESTION_LIMIT=1000000 -w /opt/airflow/catalog --volume /home/aether/git-a8c/openverse-catalog/catalog/../docker:/opt/airflow/docker/ scheduler bash -c 'pytest --extended'
env COMPOSE_PROFILES="api,ingestion_server,frontend,catalog" docker-compose -f docker-compose.yml run -u airflow --rm -e AIRFLOW_VAR_INGESTION_LIMIT=1000000 -w /opt/airflow/catalog --volume /home/aether/git-a8c/openverse-catalog/catalog/../docker:/opt/airflow/docker/ scheduler bash -c pytest --extended

Note that the last line shows as bash -c pytest --extended, meaning --extended (and any other args) get handed to bash rather than the pytest command. Here's what that same command looks like after my change:

j catalog/test --extended
just _mount-test "bash -c \'pytest --extended\'"
env COMPOSE_PROFILES="catalog_dependencies" just ../up 
env COMPOSE_PROFILES="catalog_dependencies" docker-compose -f docker-compose.yml up -d
openverse-catalog_postgres_1 is up-to-date
openverse-catalog_s3_1 is up-to-date
Starting openverse-catalog_load_to_s3_1 ... done
just ../run --rm -e AIRFLOW_VAR_INGESTION_LIMIT=1000000 -w /opt/airflow/catalog --volume /home/aether/git-a8c/openverse-catalog/catalog/../docker:/opt/airflow/docker/ scheduler bash -c \'pytest --extended\'
just dc run -u airflow  "--rm -e AIRFLOW_VAR_INGESTION_LIMIT=1000000 -w /opt/airflow/catalog --volume /home/aether/git-a8c/openverse-catalog/catalog/../docker:/opt/airflow/docker/ scheduler bash -c 'pytest --extended'"
env COMPOSE_PROFILES="api,ingestion_server,frontend,catalog" docker-compose -f docker-compose.yml run -u airflow --rm -e AIRFLOW_VAR_INGESTION_LIMIT=1000000 -w /opt/airflow/catalog --volume /home/aether/git-a8c/openverse-catalog/catalog/../docker:/opt/airflow/docker/ scheduler bash -c 'pytest --extended'

The quotes around 'pytest --extended' now ensure that all extra CLI args handed at the very top make their way down to pytest. Very odd behavior! I want to take a look at the cache warnings too, so I might push one more commit.

@AetherUnbound
Copy link
Contributor

Okay I think I've got it rectified 😄

@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 8, 2023

Thank you @AetherUnbound for fixing nearly every shortcoming left. Also both you and @stacimc for the super-fast turnaround on the reviews.

Should we merge this into main or only consider this branch as the base for the monorepo merge and leave main unaffected? I'm inclined towards the latter so that our prod and infra can work unchanged while we move the code around.

@dhruvkb dhruvkb requested review from AetherUnbound and stacimc April 8, 2023 06:58
@AetherUnbound
Copy link
Contributor

I'm just realizing that we have a few community PRs (#999, #1055, #1059) that we should probably merge before we merge this one 😅 I'd be okay merging this into main after we merge those and update this PR! What we could do is disable the dag sync cron job on production, at least until we update the infrastructure repo and deploy a new version of the catalog. We're also waiting on the image data refresh right now which currently has a 3 week timeout, so there's not really a rush to deploy a new version - thus it might make more sense to be working off of main while we're at it. What do you think?

@dhruvkb
Copy link
Member Author

dhruvkb commented Apr 9, 2023

I agree with all your points.

  • We should merge the community PRs before this, then update this PR with those changes.
  • If merging to main doesn't cause any breakages, I'm all for it. It'll probably take one week to fix all the usages with references to the monorepo so if there is no need to deploy in that time, we can merge.

@AetherUnbound
Copy link
Contributor

Excellent! I will shut the dag-sync cron off once we've merged all the community PRs 🙂

Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so this is not blocked when time comes to merge, beyond that it's ready to go!

Copy link
Contributor

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of my local testing has gone great! Excited for this 🚀

Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested all of the commands and a couple of DAGs after the rebase - everything looks good! I only have one comment but after that this can be merged 🥳

@dhruvkb dhruvkb changed the title Prepare repository for merging into the monorepo Restructure codebase to be more monorepo-friendly Apr 14, 2023
@dhruvkb dhruvkb merged commit aedc9c1 into main Apr 14, 2023
@dhruvkb dhruvkb deleted the monorepo_prep branch April 14, 2023 02:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🤖 aspect: dx Concerns developers' experience with the codebase ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants