metrics: record message/request event even in case of error #464

MichaelMure · 2020-02-27T17:02:16Z

Stumbled upon this while working on #453

The metric for message/request sent/received was not incremented when an error occurred but the error count was. This would lead to a meaningless error ratio.

... otherwise any kinde of error ratio is meaningless.

dht_net.go

MichaelMure · 2020-02-28T17:10:47Z

@aarshkshah1992 the point of this PR is that at the moment those metrics only record successful events. This makes it at least weird to compute an error ratio.

One could argue that instead of doing err_count / event_count you can still get that ratio by doing err_count / (valid_count+err_count) but I would say that it's not what the name nor the description of the metrics imply.

Stebalien · 2020-02-29T00:06:52Z

dht_net.go

@@ -107,6 +109,8 @@ func (dht *IpfsDHT) handleNewMessage(s network.Stream) bool {
 				ctx,
 				[]tag.Mutator{tag.Upsert(metrics.KeyMessageType, "UNKNOWN")},
 				metrics.ReceivedMessageErrors.M(1),
+				metrics.ReceivedMessages.M(1),
+				metrics.ReceivedBytes.M(int64(req.Size())),


Should be len(msgbytes) but we need to record that before we release the message.

it could always be len(msgbytes), right ? Does it ever make sense to use req.Size() instead?

I did that, let me know if that doesn't make sense.

Does it ever make sense to use

For requests, no.

dht_net.go

Stebalien · 2020-02-29T00:10:02Z

dht_net.go

@@ -96,6 +96,8 @@ func (dht *IpfsDHT) handleNewMessage(s network.Stream) bool {
 				ctx,
 				[]tag.Mutator{tag.Upsert(metrics.KeyMessageType, "UNKNOWN")},
 				metrics.ReceivedMessageErrors.M(1),
+				metrics.ReceivedMessages.M(1),


Hm, we probably shouldn't record either a received message or a message error unless we receive a non-empty message. If the stream's broken for some reason, that's not something the DHT is really concerned about.

Would it be fair to not record anything at all ? My understanding is that if the stream is broken we don't get io.EOF but we don't have a message either.

The more I read your message, the more I get confused ;-). It seems to me that there is three thing we can do:

valid message: record metrics.ReceivedMessages.M(1)

broken message: record both metrics.ReceivedMessages.M(1) and metrics.ReceivedMessageErrors.M(1)

no message: don't record anything

So what about:

io.EOF --> no message

len(msgbytes)==0 --> no message

len(msgbytes)!=0 --> broken message

I did that, let me know if that doesn't make sense.

We might also return early on err.Error() == "stream reset" but that's quite brittle.

dht_net.go

Stebalien · 2020-02-29T01:02:35Z

I guess it depends on your perspective. From the standpoint of the query logic, these messages are "sent". From the standpoint of the network, these messages haven't been sent.

I'm inclined to merge this as long as we leave a comment explaining it. Especially because outbound request counting is now strictly more accurate (before, we weren't counting requests that successfully sent but didn't receive a response, now we're counting all attempts).

MichaelMure · 2020-03-02T12:17:15Z

Alright, I addressed all your comments, it definitely looks more correct now.

MichaelMure · 2020-03-03T10:40:36Z

FYI, with this PR applied, the reported error ratio for received message/requests fall down to ~0.3% which seems to be what is actually happening:

The breakdown of the remaining is:

unknown: 55%
find_node: 33%
put_value: 11%
get_providers: 1.3%
get_value: 0.2%

MichaelMure added 2 commits February 27, 2020 15:52

immediate buffer release instead of defering

fe2d21a

metrics: also record message/request event on error

979ed6a

... otherwise any kinde of error ratio is meaningless.

Stebalien requested a review from aarshkshah1992 February 27, 2020 17:34

aarshkshah1992 suggested changes Feb 28, 2020

View reviewed changes

dht_net.go Outdated Show resolved Hide resolved

dht_net.go Show resolved Hide resolved

dht_net.go Outdated Show resolved Hide resolved

MichaelMure mentioned this pull request Feb 28, 2020

Occasional blocking when using Provide() #453

Closed

Stebalien requested changes Feb 29, 2020

View reviewed changes

Stebalien reviewed Feb 29, 2020

View reviewed changes

dht_net.go Outdated Show resolved Hide resolved

net: address PR comments regarding metrics

faa5638

Stebalien approved these changes Mar 3, 2020

View reviewed changes

Stebalien merged commit c2631d9 into libp2p:master Mar 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: record message/request event even in case of error #464

metrics: record message/request event even in case of error #464

MichaelMure commented Feb 27, 2020

MichaelMure commented Feb 28, 2020 •

edited

Loading

Stebalien Feb 29, 2020

MichaelMure Mar 2, 2020

MichaelMure Mar 2, 2020

Stebalien Mar 3, 2020

Stebalien Feb 29, 2020

MichaelMure Mar 2, 2020

MichaelMure Mar 2, 2020

MichaelMure Mar 2, 2020

MichaelMure Mar 2, 2020 •

edited

Loading

Stebalien Mar 3, 2020

Stebalien commented Feb 29, 2020

MichaelMure commented Mar 2, 2020

MichaelMure commented Mar 3, 2020

metrics: record message/request event even in case of error #464

metrics: record message/request event even in case of error #464

Conversation

MichaelMure commented Feb 27, 2020

MichaelMure commented Feb 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelMure Mar 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Feb 29, 2020

MichaelMure commented Mar 2, 2020

MichaelMure commented Mar 3, 2020

MichaelMure commented Feb 28, 2020 •

edited

Loading

MichaelMure Mar 2, 2020 •

edited

Loading