Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The LoadTests Go GBK Flink Batch job is flaky #30507

Open
github-actions bot opened this issue Mar 5, 2024 · 8 comments · Fixed by #33754 or #33898
Open

The LoadTests Go GBK Flink Batch job is flaky #30507

github-actions bot opened this issue Mar 5, 2024 · 8 comments · Fixed by #33754 or #33898

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2024

The LoadTests Go GBK Flink Batch is failing over 50% of the time
Please visit https://github.com/apache/beam/actions/workflows/beam_LoadTests_Go_GBK_Flink_Batch.yml?query=is%3Afailure+branch%3Amaster to see the logs.

@volatilemolotov
Copy link
Contributor

Tried increasing the timeout to almost 12h but it still times out
https://github.com/volatilemolotov/beam/actions/runs/8627370677/job/23647207749

@github-actions github-actions bot added this to the 2.59.0 Release milestone Aug 20, 2024
@github-actions github-actions bot reopened this Aug 23, 2024
Copy link
Contributor Author

Reopening since the workflow is still flaky

@damccorm damccorm removed this from the 2.59.0 Release milestone Aug 23, 2024
@liferoad liferoad self-assigned this Nov 19, 2024
@liferoad
Copy link
Contributor

Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
	at org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:207)
	at org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:181)

@liferoad
Copy link
Contributor

liferoad commented Nov 21, 2024

Tested this locally:

./gradlew :sdks:go:test:load:run -PloadTest.mainClass=group_by_key -Prunner=FlinkRunner -PloadTest.args='--influx_namespace=flink --influx_measurement=go_batch_gbk_1 --input_options="{\"num_records\":200000000,\"key_size\":1,\"value_size\":9}" --iterations=1 --fanout=1 --parallelism=5 --endpoint=localhost:8099 --environment_type=DOCKER --environment_config=gcr.io/apache-beam-testing/beam-sdk/beam_go_sdk:latest --runner=FlinkRunner'
024/11/21 21:58:01 Failed to execute job:      connecting to job service
failed to dial server at localhost:8099
        caused by:
context deadline exceeded
panic: Failed to execute job:   connecting to job service
        failed to dial server at localhost:8099
                caused by:
        context deadline exceeded

goroutine 1 [running]:
github.com/apache/beam/sdks/v2/go/pkg/beam/log.Fatalf({0x234e280, 0x3bc7c60}, {0x21193ff?, 0x3bc7c60?}, {0xc00078ff28?, 0x0?, 0x0?})
        /usr/local/google/home/xqhu/Dev/beam/sdks/go/pkg/beam/log/log.go:162 +0x7d
main.main()
        /usr/local/google/home/xqhu/Dev/beam/sdks/go/test/load/group_by_key/group_by_key.go:98 +0x3c9

> Task :sdks:go:test:load:run FAILED

FAILURE: Build failed with an exception.

@liferoad
Copy link
Contributor

image

liferoad added a commit that referenced this issue Nov 22, 2024
From #30507 (comment), try to use the default machine types for Flink with more memory.
liferoad added a commit that referenced this issue Nov 22, 2024
From #30507 (comment), try to use the default machine types for Flink with more memory.
@liferoad
Copy link
Contributor

liferoad commented Nov 23, 2024

Steps to run a local test

  1. run the local flink cluster
wget https://downloads.apache.org/flink/flink-1.17.2/flink-1.17.2-bin-scala_2.12.tgz
tar zxvf flink-1.17.2-bin-scala_2.12.tgz
cd flink-1.17.2
./bin/start-cluster.sh
  1. run the job server
docker run --net=host gcr.io/apache-beam-testing/beam_portability/beam_flink1.17_job_server --flink-master=localhost:8081
  1. run a Go test
./gradlew :sdks:go:test:load:run -PloadTest.mainClass=group_by_key -Prunner=FlinkRunner -PloadTest.args='--influx_namespace=flink --influx_measurement=go_batch_gbk_1 --input_options="{\"num_records\":200,\"key_size\":1,\"value_size\":9}" --iterations=1 --fanout=1 --parallelism=1 --endpoint=localhost:8099 --environment_type=DOCKER --environment_config=gcr.io/apache-beam-testing/beam-sdk/beam_go_sdk --runner=PortableRunner'

@Amar3tto Amar3tto self-assigned this Jan 24, 2025
@github-actions github-actions bot added this to the 2.63.0 Release milestone Jan 24, 2025
@github-actions github-actions bot reopened this Jan 27, 2025
Copy link
Contributor Author

Reopening since the workflow is still flaky

@github-actions github-actions bot reopened this Feb 10, 2025
Copy link
Contributor Author

Reopening since the workflow is still flaky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment