Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add maximum timeout protection #946

Merged
merged 6 commits into from
Jul 13, 2018
Merged

Add maximum timeout protection #946

merged 6 commits into from
Jul 13, 2018

Conversation

vancexu
Copy link
Contributor

@vancexu vancexu commented Jul 9, 2018

Currently we do not limit workflow timeout for customers. But because Cassandra TTL maximum is 630720000 (20 years) and if customer put a number larger than that, it will cause errors like "RecordWorkflowExecutionStarted operation failed" , and CreateTask failure.

This PR protects:

  1. Task ScheduleToClose
  2. Visibility TTL

@vancexu vancexu requested review from mfateev and samarabbas July 9, 2018 23:17
@@ -1349,11 +1349,21 @@ func (wh *WorkflowHandler) StartWorkflowExecution(
Message: "A valid ExecutionStartToCloseTimeoutSeconds is not set on request."}, scope)
}

if startRequest.GetExecutionStartToCloseTimeoutSeconds() > common.MaxWorkflowTimeout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to not restrict workflow timeout and instead have a max limit on decision task timeout of 1 year.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

if startRequest.GetTaskStartToCloseTimeoutSeconds() <= 0 {
return nil, wh.error(&gen.BadRequestError{
Message: "A valid TaskStartToCloseTimeoutSeconds is not set on request."}, scope)
}

if startRequest.GetTaskStartToCloseTimeoutSeconds() > startRequest.GetExecutionStartToCloseTimeoutSeconds() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be limiting decision task timeout to some reasonable value. Let's have a discussion on this.

@@ -2449,6 +2449,13 @@ func validateActivityScheduleAttributes(attributes *workflow.ScheduleActivityTas
return &workflow.BadRequestError{Message: "A valid timeout may not be negative."}
}

if attributes.GetScheduleToCloseTimeoutSeconds() > common.MaxWorkflowTimeout ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably this should compare activity timeout against workflow timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, changed

@vancexu vancexu force-pushed the ptimeout branch 2 times, most recently from cec1f6c to a957592 Compare July 10, 2018 23:30
"Domain": domain,
"WorkflowID": wid,
"WorkflowType": wfType,
}).Warnf("Decision timeout %d is too large", t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either use an eventID for this log or don't use Warnf and add timeout as a tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -150,7 +152,7 @@ func (v *cassandraVisibilityPersistence) Close() {

func (v *cassandraVisibilityPersistence) RecordWorkflowExecutionStarted(
request *RecordWorkflowExecutionStartedRequest) error {
ttl := request.WorkflowTimeout + openExecutionTTLBuffer
ttl := common.MinInt64(request.WorkflowTimeout+openExecutionTTLBuffer, maxCassandraTTL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open visibility record will be deleted after this TTL. It will be very confusing if we allow WorkflowTimeout of greater than 20 years but only allow visibility record for upto 20 years max. I think we should just drop the TTL part from the query if Workflow Timeout is bigger than the supported TTL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is better, changed

@@ -2463,6 +2463,13 @@ func validateActivityScheduleAttributes(attributes *workflow.ScheduleActivityTas
return &workflow.BadRequestError{Message: "A valid timeout may not be negative."}
}

if attributes.GetScheduleToCloseTimeoutSeconds() > wfTimeout ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than failing decision tasks can we reduce the timeout to be the same as workflow timeout?
I'm worried this might cause issues in production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, changed

@vancexu vancexu merged commit 19e4de2 into master Jul 13, 2018
@vancexu vancexu deleted the ptimeout branch July 13, 2018 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants