Consider disabling exposed ports in CI to avoid port conflict flakiness #1035
Labels
🤖 aspect: dx
Concerns developers' experience with the codebase
🛠 goal: fix
Bug fix
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: mgmt
Related to repo management and automations
Problem
CI jobs sometimes fail due to port conflicts.
cf #990 and #200
Description
One way to avoid this is to not bind ports at all in CI. At a glance, I don't think we need to bind ports at all in CI as we never make requests to running containers in CI: everything happens inside the containers and we just check the output.
To implement this we'd need to remove all port declarations from the base
docker-compose.yml
file and add them in adocker-compose.development.yml
as overrides instead. See this comment: docker/compose#3729 (comment)This is not feasible to do in a single PR and will require multiple discrete PRs of smaller, reviewable chunks of code to accomplish. I tried this in #1036 and learned the hard way that this is actually a very big change. Here is my rough outline of how I would approach this in discrete PRs:
load_sample_data.sh
to run inside the docker-compose networking context.localhost
to reference the docker networking name for the service. e.g.,localhost:<es-port>
becomeses:9200
for the Elasticsearch connection. Etc for Postgres, Redis, and so on.docker/compose
container. This would spin up the ingestion server tests inside the docker-compose container and not require any host bound ports. The downside here is that we introduce yet another docker image to download to run tests. It may be easier to copy the compose binary into one of the Python image we already use via thedocker/compose-bin
image. This at least prevents us from needing to download the entire OS for the compose image and just reuse one we already have for existing services.dc exec
ordc run
even, but it's a whole heap of work to change this stuff.There are probably other things that need to be taken into consideration as well, like how the ingestion server gets deployed.
Alternatives
Do the retry approach that @dhruvkb implemented in #990. This could work but is potentially still flaky and adds complexity to the CI.
The text was updated successfully, but these errors were encountered: