Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: concurrent map read and map write #439

Open
dhiaayachi opened this issue Sep 5, 2024 · 3 comments
Open

bug: concurrent map read and map write #439

dhiaayachi opened this issue Sep 5, 2024 · 3 comments

Comments

@dhiaayachi
Copy link
Owner

Expected Behavior

I ran into a weird error in a test run, just using the temporal auto-setup docker image version 1.19.1. I don't think I'm doing anything special. Just starting some basic containers.

Actual Behavior

The following logs show how temporal failed to start
https://gist.github.com/vikstrous2/7d016b5562903b723d93b6a403589620

Steps to Reproduce the Problem

Start temporal from this docker-compose file over and over again until this error triggers:

version: '3.4'
services:
  temporal-db:
    image: postgres:9.6.24-alpine@sha256:8342bcb43446694428ec6594e72e4299692854f0fc3aca090b0ab46f4c7f32a1
    restart: unless-stopped
    environment:
      POSTGRES_PASSWORD: temporal
      POSTGRES_USER: temporal
    ports:
      - 5434:5432
    healthcheck:
      interval: 1000h
      test: 'true'
  temporal:
    image: temporalio/auto-setup:1.19.1@sha256:3b582c47c354e7f9958c098f168ceb514766ab93526e9be1d772179663710d0f
    restart: unless-stopped
    depends_on:
      - temporal-db
    environment:
      - DB=postgresql
      - DB_PORT=5432
      - POSTGRES_USER=temporal
      - POSTGRES_PWD=temporal
      - POSTGRES_SEEDS=temporal-db
    ports:
      - 7233:7233
    healthcheck:
      interval: 1000h
      test: 'true'

Specifications

  • Version: 1.19.1
  • Platform: docker
@dhiaayachi
Copy link
Owner Author

Thank you for reporting this issue. It looks like the issue you are experiencing is the Resource Exhausted Cause Concurrent Limit error.

The Docker image you are using is running all of the Temporal Server's services in a single process.
In this setup, you might be exceeding the MaxConcurrentActivityExecutionSize Worker option, which is set to a default value of 1000.
This means that the process cannot support more than 1,000 simultaneous Activity Task Executions at once.

If the number of pending Activities exceeds the MaxConcurrentActivityExecutionSize limit, the Temporal Server will fail the next Workflow Task with a Resource Exhausted Cause Concurrent Limit error.

To resolve this issue, you should increase the MaxConcurrentActivityExecutionSize value. You can increase this by customizing your Worker options in your Temporal Go SDK. See How to set WorkerOptions in Go for a full list of options.

For more information about customizing Docker images and the available WorkerOptions, see the Temporal Docker Builds repository.

You might want to increase MaxConcurrentActivityExecutionSize to at least the same value as your Poller Count.

Another alternative is to change the configuration of your Temporal Service so that each service runs as a separate Docker container.
This enables you to scale the services independently, giving you better control over how much resources each service uses.
See the temporalio/docker-compose repo for an example on how to do this.
This can be complex, so for production deployments you might want to consider Temporal Cloud.

@dhiaayachi
Copy link
Owner Author

Thanks for reporting this issue!

The logs indicate that the Temporal Server is having trouble connecting to the PostgreSQL database. The most likely cause is a mismatch between the Temporal Server version and the PostgreSQL version.

To resolve this issue, try the following:

1. **Upgrade your PostgreSQL to a supported version.** Check the [Temporal Server release notes](https://github.com/temporalio/temporal/releases) for the latest supported PostgreSQL versions for your Temporal Server version.
2. **Update your database schema.** Use the `temporal-sql-tool` to update your PostgreSQL schema for the new version. For details, see the [Upgrade Server](https://docs.temporal.io/self-hosted-guide/upgrade-server#upgrade-server) section.

If you're still experiencing the issue, please provide the following information:

- The exact versions of the Temporal Server and PostgreSQL you are using.
- The output of the `temporal server start-dev --help` command.
- Any error messages you see in the Temporal Server logs.

This will help us troubleshoot the issue further.

@dhiaayachi
Copy link
Owner Author

Thanks for reporting this issue!

It looks like the error you are seeing is caused by a known bug in Temporal v1.19.1 that was resolved in v1.19.2.

Please upgrade to the latest version to resolve the issue: 
* [https://github.com/temporalio/temporal/releases/tag/v1.19.2](https://github.com/temporalio/temporal/releases/tag/v1.19.2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant