-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 1059 - TaskRun controller update status and labels on error #1204
Conversation
The following is the coverage report on pkg/.
|
The following is the coverage report on pkg/.
|
/approve cancel |
/test pull-tekton-pipeline-integration-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about the use cases for this - from #1059:
When reconcile returns an error, there status, labels and annotations of the TaskRun/PipelineRun should still be sync'ed back.
In case of error, the error is returned, and no update happens.
There is only a little number of cases where this is relevant; nonetheless it should be fixed and unit test coverage added
I want to double check that we actually want to update the status in this case. I can't really think of a reason not to, however just to make sure we're on the same page about the behaviour if you look at how Reconcile
is called (https://github.com/knative/pkg/blob/1a3a36a996dec7645bee7084153ef8b7273e1028/controller/controller.go#L299) if an error is returned, the item will get queued again (so theoretically it wouldn't stay in this state very long
i think part of the reasoning behind not trying to update the status also might be that if something is wrong in the underlying system, then there's a good chance the attempt to update the status itself will fail. using the multierror should at least avoid hiding that error, but im just not 100% clear that its worth updating the status
|
||
// In case of reconcile errors, we store the error in a multierror, attempt | ||
// to update, and return the original error combined with any update error | ||
merr error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i find that global variables tend to cause problems (http://wiki.c2.com/?GlobalVariablesAreBad seems like an okay source of info about this) - for example if more than one Reconcile
happens at a time (and I think it might, I'm not sure) then multiple goroutines could try to write to this simultaneously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this and for the pointers - I've got some reading and investigating to do.
I think it's reasonable to assume that more than one Reconcile
could happen at the time, so I should fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
OK, I see your point. In my understanding there are two types of failures that may happen during a
Is it correct that the last three are "recoverable"? In any case like you said It may be worth adding some documentation to BTW, there is also the case of |
I think that's a really good idea - I had to go and track down the code in knative/pkg to double check :S
That doesn't sound right 😅
I guess it's hard to say! It would depend probably on what caused the issue - I think my rule of thumb would be if the error was something related to the user's actions (i.e. something user did incorrectly), it's "unrecoverable", but if it's not the user, if it's the underlying system (e.g. the service account running the pipelinerun controller cant create TaskRuns or something) then it's "recoverable" |
The code assumes that these errors such as those from
Have not actually tried this out but from looking at the code it seems like instead of returning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be good to go once we ensure that merr
is not a global variable.
82a6df3
to
495f094
Compare
Fixed that and rebased |
The TaskRun controller reconcile returns errors immediately. It should instead first attempt to update status and labels and then return both the original error plus any further error raised during the update. Fixes tektoncd#1059
495f094
to
b8c1c42
Compare
The following is the coverage report on pkg/.
|
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gavinfish, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Changes
TaskRun controller update status and labels even when an error occurs.
As described in #1059 the TaskRun controller, in case an error is received from
reconcile
, fails to update the status. The comments in the code even suggest this behaviour, but the implementation does not match them. This PR fixes that.These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Double check this list of stuff that's easy to miss:
cmd
dir, please updatethe release Task and TaskRun to build and release this image
Reviewer Notes
If API changes
are included, additive changes
must be approved by at least two OWNERS
and backwards incompatible changes
must be approved by more than 50% of the OWNERS,
and they must first be added
in a backwards compatible way.
Release Notes
N/A - no API changes.