Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [seatunnel-engine-server] slot申请时如果资源不够,已申请成功的资源未释放 #6761

Closed
2 of 3 tasks
liangcw1111 opened this issue Apr 26, 2024 · 6 comments · Fixed by #6763
Closed
2 of 3 tasks

Comments

@liangcw1111
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

当设置seatunnel.engine.slot-service.slot-num=5时,提交一个需要6个slot的任务,前面5个slot申请成功,最后一个因为资源不足抛出NoEnoughResourceException. 此时任务失败结束但申请成功的5个slot没有释放.

SeaTunnel Version

2.3.4

SeaTunnel Config

seatunnel:
  engine:
    classloader-cache-mode: true
    backup-count: 1
    print-execution-info-interval: 120
    print-job-metrics-info-interval: 10
    queue-type: blockingqueue
    slot-service:
      dynamic-slot: false
      slot-num: 5
    checkpoint:
      interval: 30000
      timeout: 21474836460
      max-concurrent: 10
      tolerable-failure: 2

Running Command

java -Dseatunnel.config=/alidata1/za-seatunnel/apache-seatunnel-2.3.4-SNAPSHOT/config/seatunnel.yaml -Dhazelcast.config=/alidata1/za-seatunnel/apache-seatunnel-2.3.4-SNAPSHOT/config/hazelcast.yaml -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j2.configurationFile=/alidata1/za-seatunnel/apache-seatunnel-2.3.4-SNAPSHOT/config/log4j2.properties -Dseatunnel.logs.path=/alidata1/za-seatunnel/apache-seatunnel-2.3.4-SNAPSHOT/logs -Dseatunnel.logs.file_name=seatunnel-engine-server -Xrunjdwp:server=y,transport=dt_socket,address=5001,suspend=y -Xms3g -Xmx3g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/seatunnel/dump/zeta-server -XX:MaxMetaspaceSize=1g -XX:+UseG1GC -XX:+PrintGCDetails -Xloggc:/alidata1/za-seatunnel/logs/gc.log -XX:+PrintGCDateStamps -XX:MaxGCPauseMillis=3000 -cp /alidata1/za-seatunnel/apache-seatunnel-2.3.4-SNAPSHOT/lib/*:/alidata1/za-seatunnel/apache-seatunnel-2.3.4-SNAPSHOT/starter/seatunnel-starter.jar org.apache.seatunnel.core.starter.seatunnel.SeaTunnelServer -d

Error Exception

NoEnoughResourceException时未释放已成功资源

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@liangcw1111 liangcw1111 changed the title [Bug] [Module Name] slot申请时如果资源不够,已申请成功的资源未释放 [Bug] [seatunnel-engine-server] slot申请时如果资源不够,已申请成功的资源未释放 Apr 26, 2024
@VincentSleepless
Copy link

@hailin0 海林麻烦看看!

@hailin0
Copy link
Member

hailin0 commented Apr 26, 2024

cc @Hisoka-X

@liunaijie
Copy link
Member

please temporarily assign me, thanks.

@liangcw1111
Copy link
Author

image
在SubPlan中执行完CANCELING的逻辑后,jobMaster的resourceManger的registerWorker中仍然存在assignedSlots.
跟踪PhysicalVertex的cancel逻辑,发现checkTaskGroupIsExecuting返回false,直接更改状态为CANCELED , 没有执行CancelTaskOperation逻辑

@liunaijie
Copy link
Member

update some finding here:
in this method releasePipelineResource can't get the slot prifle. so the resource is not released.
image

@liunaijie
Copy link
Member

update some finding here: in this method releasePipelineResource can't get the slot prifle. so the resource is not released. image

this IMAP info is put on SCHEDULED status when resource apply successed. but in this case, apply resource is failed. so put method is never called.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants