-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix restore functionality hangs forever #4918
Conversation
lgtm, although we should probably change the |
I agree with @benbjohnson @oiooj -- thanks, you have found a real bug in our code. However, the better fix is as shown here: https://github.com/influxdb/influxdb/blob/master/services/opentsdb/service.go#L211 Would you mind following this pattern and fixing the bug that way? We shouldn't have coded string-compares. |
@otoolep fixed |
@@ -735,12 +734,20 @@ func (s *Store) serveExecListener() { | |||
// Accept next TCP connection. | |||
var err error | |||
conn, err := s.ExecListener.Accept() | |||
if err != nil { | |||
if strings.Contains(err.Error(), "connection closed") { | |||
if opErr, ok := err.(*net.OpError); ok && opErr.Temporary() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right to me.
If the error is not temporary, then simply return. Otherwise continue in the loop. Won't that work?
Also, why are you sleeping for a second? Is that debug code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@otoolep If the error is temporary, then continue. Other error return. if error == nil , handle this Conn.
And there is no need to sleep for a second, deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see where you are going.
This still seems a little bit awkward to me. Let me explain.
If the error returned from Accept
is temporary, then keep looping. However, if the error is not temporary, then return. The way other code (code that is shutting down the system) signals that the function should return is simply by closing the listener. That will result in a non-temporary error, so polling closing
isn't really necessary.
Perhaps this wouldn't work with the ExecListener
for some reason? (I have not looked at that code yet).
s.Logger.Printf("exec listener temporary accept error: %s", err) | ||
|
||
select { | ||
case <-s.closing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that if a signal is sent on closing
then the ExecListener
would be closed from somewhere else and a non-temporary error would result. I might be wrong though.
@otoolep yes, you're right. Now if I send a signal to shutdown the system all is normal.
just like the cluster service. |
Thanks @oiooj -- +1 from me. @benbjohnson -- would you mind re-reviewing? |
lgtm |
Fix restore functionality hangs forever
if return
errors.New("listener closing")
, theserveExecListener
function will not return in a loop,and the program will hangs at https://github.com/influxdb/influxdb/blob/master/meta/store.go#L567Fix #4806
Ref: https://github.com/influxdb/influxdb/blob/master/meta/store.go#L739