-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: record message/request event even in case of error #464
Conversation
... otherwise any kinde of error ratio is meaningless.
@aarshkshah1992 the point of this PR is that at the moment those metrics only record successful events. This makes it at least weird to compute an error ratio. One could argue that instead of doing |
dht_net.go
Outdated
@@ -107,6 +109,8 @@ func (dht *IpfsDHT) handleNewMessage(s network.Stream) bool { | |||
ctx, | |||
[]tag.Mutator{tag.Upsert(metrics.KeyMessageType, "UNKNOWN")}, | |||
metrics.ReceivedMessageErrors.M(1), | |||
metrics.ReceivedMessages.M(1), | |||
metrics.ReceivedBytes.M(int64(req.Size())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be len(msgbytes)
but we need to record that before we release the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could always be len(msgbytes)
, right ? Does it ever make sense to use req.Size()
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that, let me know if that doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it ever make sense to use
For requests, no.
dht_net.go
Outdated
@@ -96,6 +96,8 @@ func (dht *IpfsDHT) handleNewMessage(s network.Stream) bool { | |||
ctx, | |||
[]tag.Mutator{tag.Upsert(metrics.KeyMessageType, "UNKNOWN")}, | |||
metrics.ReceivedMessageErrors.M(1), | |||
metrics.ReceivedMessages.M(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we probably shouldn't record either a received message or a message error unless we receive a non-empty message. If the stream's broken for some reason, that's not something the DHT is really concerned about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be fair to not record anything at all ? My understanding is that if the stream is broken we don't get io.EOF
but we don't have a message either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I read your message, the more I get confused ;-). It seems to me that there is three thing we can do:
- valid message: record
metrics.ReceivedMessages.M(1)
- broken message: record both
metrics.ReceivedMessages.M(1)
andmetrics.ReceivedMessageErrors.M(1)
- no message: don't record anything
So what about:
io.EOF
--> no messagelen(msgbytes)==0
--> no messagelen(msgbytes)!=0
--> broken message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that, let me know if that doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also return early on err.Error() == "stream reset"
but that's quite brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I guess it depends on your perspective. From the standpoint of the query logic, these messages are "sent". From the standpoint of the network, these messages haven't been sent. I'm inclined to merge this as long as we leave a comment explaining it. Especially because outbound request counting is now strictly more accurate (before, we weren't counting requests that successfully sent but didn't receive a response, now we're counting all attempts). |
Alright, I addressed all your comments, it definitely looks more correct now. |
Stumbled upon this while working on #453
The metric for message/request sent/received was not incremented when an error occurred but the error count was. This would lead to a meaningless error ratio.