Deadlock when running with COMPOSE_PARALLEL_LIMIT #5864

bcoughlan · 2018-04-10T14:28:27Z

Description of the issue

Compose can hang when trying to run tasks with a low COMPOSE_PARALLEL_LIMIT.

Suppose I'm starting 10 containers with a parallel limit of 3:

The first 3 containers begin starting. Then in service.py:_execute_convergence_create another parallel task is kicked off to actually start the containers. However because the thread pool is full this task never executes and the application hangs.

I think either:

The service.py code needs a separate thread pool (complicated).
Multiple instances of the same service need to start sequentially (inefficient for certain deployments).
The parallel logic could be contained in project.py by running a task for each instance of a service.

In cases where parallel_execute is passed an objects parameter of length 1, could it just execute it on the calling thread? That would at least limit the issue to containers where scale > 1.

Context information (for bug reports)

Tested on master (2975f06 at time of writing).

$ docker-compose --version
docker-compose version 1.21.0dev, build unknown

Steps to reproduce the issue

Below is a Compose file that starts 9 instances of Redis. Run with COMPOSE_PARALLEL_LIMIT=3 docker-compose up to observe the issue:

version: '2.3'

services:
  redis1:
    image: "redis:alpine"
    ports:
      - "6379:6379"
  redis2:
    image: "redis:alpine"
    ports:
      - "6380:6379"
  redis3:
    image: "redis:alpine"
    ports:
      - "6381:6379"
  redis4:
    image: "redis:alpine"
    ports:
      - "6382:6379"
  redis5:
    image: "redis:alpine"
    ports:
      - "6383:6379"
  redis6:
    image: "redis:alpine"
    ports:
      - "6384:6379"
  redis7:
    image: "redis:alpine"
    ports:
      - "6385:6379"
  redis8:
    image: "redis:alpine"
    ports:
      - "6386:6379"
  redis9:
    image: "redis:alpine"
    ports:
      - "6387:6379"

The text was updated successfully, but these errors were encountered:

shin- · 2018-04-13T20:35:45Z

Thanks for the report! It's something we can look into. Obviously, the simple workaround is to just set the parallel limit to a higher value.

bcoughlan · 2018-04-13T21:00:03Z

Thanks for the reply. In my case I am starting about 15 java containers that are CPU heavy on startup and was experimenting with limiting the concurrency to reduce maxing out the CPU. The time to bring them all to healthy in parallel is much slower than in sequence.

I'm guessing that is the purpose of the concurrency limit flag? I have been thinking about that and I reckon that it is Docker rather than Compose that is lacking, as it needs to be done by Compose, Swarm but also by Docker at system boot time when starting many containers with restart=always.

bjsee · 2018-08-11T09:51:08Z

Hi, we have this issue, too. We are starting about 20 java containers and want to reduce the number of parallel starting containers because otherwise the system will slow down the same way bcoughlan mentioned. Today we work around this by defining "depends on" chains, but this is really ugly because we can not replace a single container without restarting all depdendent containers.

So it would be really cool to get the COMPOSE_PARALLEL_LIMIT feature working. Is there anyhting new to this issue?

bcoughlan · 2018-08-14T11:18:52Z

@bjsee Even without this bug COMPOSE_PARALLEL_LIMIT wouldn't help much, as it doesn't wait for Docker healthchecks. I think the responsibility lies in the main Docker project, because the same issue occurs when you reboot your server and Docker starts up all the containers in parallel.

As of yet I haven't found a solution to the problem.

Alexhha · 2018-08-29T14:46:33Z

As temporary workaround you can try to combine healthchecks and depends_on (with condition). For e.g. to run 3 containers in parallel you can try something like the following

version: '2.3'

services:
  redis1:
    image: "redis:alpine"
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "..."]

  redis2:
    image: "redis:alpine"
    ports:
      - "6380:6379"
    healthcheck:
      test: ["CMD", "..."]

  redis3:
    image: "redis:alpine"
    ports:
      - "6381:6379"
    healthcheck:
      test: ["CMD", "..."]

  redis4:
    image: "redis:alpine"
    ports:
      - "6382:6379"
    healthcheck:
      test: ["CMD", "..."]
    depends_on:
      redis1:
        condition: service_healthy
      redis2:
        condition: service_healthy
      redis3:
        condition: service_healthy

  redis5:
    image: "redis:alpine"
    ports:
      - "6383:6379"
    healthcheck:
      test: ["CMD", "..."]
    depends_on:
      redis1:
        condition: service_healthy
      redis2:
        condition: service_healthy
      redis3:
        condition: service_healthy

  redis6:
    image: "redis:alpine"
    ports:
      - "6384:6379"
    healthcheck:
      test: ["CMD", "..."]
    depends_on:
      redis1:
        condition: service_healthy
      redis2:
        condition: service_healthy
      redis3:
        condition: service_healthy

Of course this method has disadvantages - you have to manually distribute workload. And if you need to start 4 containers in parallel you have to totally modify docker-compose. But I think it could be automated with simple bash scripts. It's up to you.

electrofelix · 2018-09-18T09:31:00Z

Trying to use COMPOSE_PARALLEL_LIMIT to prevent a "device or resource busy" error that occasionally appears in our CI and running into the same problem

stale · 2019-10-09T21:56:08Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-10-16T22:03:20Z

This issue has been automatically closed because it had not recent activity during the stale period.

shin- added the kind/enhancement label Apr 13, 2018

stale bot added the stale label Oct 9, 2019

stale bot closed this as completed Oct 16, 2019

This was referenced Nov 11, 2020

5864 fixed thread pool limit insufficiency on do _execute_convergence ko10ok/compose#1

Open

5864 fixed thread pool limit insufficiency on cascade parallel execute ko10ok/compose#2

Open

pagelypete mentioned this issue Mar 22, 2021

Fix COMPOSE_PARALLEL_LIMIT #8226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock when running with COMPOSE_PARALLEL_LIMIT #5864

Deadlock when running with COMPOSE_PARALLEL_LIMIT #5864

bcoughlan commented Apr 10, 2018 •

edited

Loading

shin- commented Apr 13, 2018

bcoughlan commented Apr 13, 2018

bjsee commented Aug 11, 2018

bcoughlan commented Aug 14, 2018

Alexhha commented Aug 29, 2018

electrofelix commented Sep 18, 2018

stale bot commented Oct 9, 2019

stale bot commented Oct 16, 2019

Deadlock when running with COMPOSE_PARALLEL_LIMIT #5864

Deadlock when running with COMPOSE_PARALLEL_LIMIT #5864

Comments

bcoughlan commented Apr 10, 2018 • edited Loading

Description of the issue

Context information (for bug reports)

Steps to reproduce the issue

shin- commented Apr 13, 2018

bcoughlan commented Apr 13, 2018

bjsee commented Aug 11, 2018

bcoughlan commented Aug 14, 2018

Alexhha commented Aug 29, 2018

electrofelix commented Sep 18, 2018

stale bot commented Oct 9, 2019

stale bot commented Oct 16, 2019

bcoughlan commented Apr 10, 2018 •

edited

Loading