-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add maximum timeout protection #946
Conversation
service/frontend/workflowHandler.go
Outdated
@@ -1349,11 +1349,21 @@ func (wh *WorkflowHandler) StartWorkflowExecution( | |||
Message: "A valid ExecutionStartToCloseTimeoutSeconds is not set on request."}, scope) | |||
} | |||
|
|||
if startRequest.GetExecutionStartToCloseTimeoutSeconds() > common.MaxWorkflowTimeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we decided to not restrict workflow timeout and instead have a max limit on decision task timeout of 1 year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
service/frontend/workflowHandler.go
Outdated
if startRequest.GetTaskStartToCloseTimeoutSeconds() <= 0 { | ||
return nil, wh.error(&gen.BadRequestError{ | ||
Message: "A valid TaskStartToCloseTimeoutSeconds is not set on request."}, scope) | ||
} | ||
|
||
if startRequest.GetTaskStartToCloseTimeoutSeconds() > startRequest.GetExecutionStartToCloseTimeoutSeconds() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be limiting decision task timeout to some reasonable value. Let's have a discussion on this.
service/history/historyEngine.go
Outdated
@@ -2449,6 +2449,13 @@ func validateActivityScheduleAttributes(attributes *workflow.ScheduleActivityTas | |||
return &workflow.BadRequestError{Message: "A valid timeout may not be negative."} | |||
} | |||
|
|||
if attributes.GetScheduleToCloseTimeoutSeconds() > common.MaxWorkflowTimeout || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably this should compare activity timeout against workflow timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, changed
cec1f6c
to
a957592
Compare
common/logging/helpers.go
Outdated
"Domain": domain, | ||
"WorkflowID": wid, | ||
"WorkflowType": wfType, | ||
}).Warnf("Decision timeout %d is too large", t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either use an eventID for this log or don't use Warnf and add timeout as a tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -150,7 +152,7 @@ func (v *cassandraVisibilityPersistence) Close() { | |||
|
|||
func (v *cassandraVisibilityPersistence) RecordWorkflowExecutionStarted( | |||
request *RecordWorkflowExecutionStartedRequest) error { | |||
ttl := request.WorkflowTimeout + openExecutionTTLBuffer | |||
ttl := common.MinInt64(request.WorkflowTimeout+openExecutionTTLBuffer, maxCassandraTTL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open visibility record will be deleted after this TTL. It will be very confusing if we allow WorkflowTimeout of greater than 20 years but only allow visibility record for upto 20 years max. I think we should just drop the TTL part from the query if Workflow Timeout is bigger than the supported TTL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is better, changed
service/history/historyEngine.go
Outdated
@@ -2463,6 +2463,13 @@ func validateActivityScheduleAttributes(attributes *workflow.ScheduleActivityTas | |||
return &workflow.BadRequestError{Message: "A valid timeout may not be negative."} | |||
} | |||
|
|||
if attributes.GetScheduleToCloseTimeoutSeconds() > wfTimeout || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than failing decision tasks can we reduce the timeout to be the same as workflow timeout?
I'm worried this might cause issues in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, changed
Currently we do not limit workflow timeout for customers. But because Cassandra TTL maximum is 630720000 (20 years) and if customer put a number larger than that, it will cause errors like "RecordWorkflowExecutionStarted operation failed" , and CreateTask failure.
This PR protects: