Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider disabling exposed ports in CI to avoid port conflict flakiness #1035

Closed
sarayourfriend opened this issue Mar 27, 2023 · 0 comments
Closed
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: mgmt Related to repo management and automations

Comments

@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Mar 27, 2023

Problem

CI jobs sometimes fail due to port conflicts.

cf #990 and #200

Description

One way to avoid this is to not bind ports at all in CI. At a glance, I don't think we need to bind ports at all in CI as we never make requests to running containers in CI: everything happens inside the containers and we just check the output.

To implement this we'd need to remove all port declarations from the base docker-compose.yml file and add them in a docker-compose.development.yml as overrides instead. See this comment: docker/compose#3729 (comment)

This is not feasible to do in a single PR and will require multiple discrete PRs of smaller, reviewable chunks of code to accomplish. I tried this in #1036 and learned the hard way that this is actually a very big change. Here is my rough outline of how I would approach this in discrete PRs:

  1. Refactor load_sample_data.sh to run inside the docker-compose networking context.
    • Has benefits aside from this project, for example, new contributors do not have to debug environment issues with the script
    • Rather than refactoring the script directly, we could also change it to be a management command. This would make certain things a lot easier, like tracking ingestion server task status. This is a bigger task to review though.
    • Do not yet remove the just recipes used for waiting for ES, etc, as the ingestion server tests also need them. We can go ahead and remove the just recipes used for making ingestion server requests, provided no one uses them outside the load sample data script.
  2. Update the API to use docker networking rather than host networking
    • Mostly involves updating references to localhost to reference the docker networking name for the service. e.g., localhost:<es-port> becomes es:9200 for the Elasticsearch connection. Etc for Postgres, Redis, and so on.
  3. Update the ingestion server to run inside the docker networking context. Do not modify the tests yet though.
  4. Get ingestion server tests running without host-bound ports. There are two ways to do this. An "easy" way and a very hard way.
    • Easy way: Run the tests inside a docker/compose container. This would spin up the ingestion server tests inside the docker-compose container and not require any host bound ports. The downside here is that we introduce yet another docker image to download to run tests. It may be easier to copy the compose binary into one of the Python image we already use via the docker/compose-bin image. This at least prevents us from needing to download the entire OS for the compose image and just reuse one we already have for existing services.
    • Hard way: Change the way the tests are run to not rely on host-bound networking. The reason this is hard is that the tests currently rely on host bound networking for making necessary schema and data changes to the support databases. These can be done through dc exec or dc run even, but it's a whole heap of work to change this stuff.

There are probably other things that need to be taken into consideration as well, like how the ingestion server gets deployed.

Alternatives

Do the retry approach that @dhruvkb implemented in #990. This could work but is potentially still flaky and adds complexity to the CI.

@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 🛠 goal: fix Bug fix 🤖 aspect: dx Concerns developers' experience with the codebase 🧱 stack: mgmt Related to repo management and automations labels Mar 27, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Mar 27, 2023
dhruvkb pushed a commit that referenced this issue Apr 14, 2023
* first pass with broken tests

* Fix mocking for S3 hook in inaturalist tests

* add test case for mixed s3 update dates

* lint

---------

Co-authored-by: Madison Swain-Bowden <[email protected]>
@sarayourfriend sarayourfriend closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2023
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Openverse Backlog Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: mgmt Related to repo management and automations
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant