-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relay service: add metrics #2154
Conversation
1d4d3cd
to
a2e712d
Compare
r.gc() | ||
r.wg.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a bug in our implementation? If so, would you mind opening a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a race condition. yes. Will do. #2162
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the fix for the bug. I added the gc call there since it simplified counting closed connections.
https://github.com/libp2p/go-libp2p/pull/2164/files
break | ||
} | ||
if r.metricsTracer != nil { | ||
r.metricsTracer.BytesTransferred(nr + nw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't you counting every byte twice here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay true. bytes transferred should only be one of them. I did it because bandwidth used would be incoming + outgoing. But that should be handled in the dashboard. Will fix.
ConnectionClosed(d time.Duration) | ||
// ConnectionRequestHandled tracks metrics on handling a relay connection request | ||
// rejectionReason is ignored for status other than `requestStatusRejected` | ||
ConnectionRequestHandled(status string, rejectionReason string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to split this up in ConnectionRequestHandled
(for success) and ConnectionRequestRejected
? What would be the difference between ConnectionRequestReceived
and ConnectionRequestHandled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah received was the same. I've removed ConnectionRequestReceived
and ReservationRequestReceived
and kept ReservationRequestHandled
and ConnectionRequestHandled
p := s.Conn().RemotePeer() | ||
a := s.Conn().RemoteMultiaddr() | ||
|
||
if isRelayAddr(a) { | ||
log.Debugf("refusing relay reservation for %s; reservation attempt over relay connection") | ||
r.handleError(s, pbv2.Status_PERMISSION_DENIED) | ||
r.handleErrorAndTrackMetrics(s, pbv2.HopMessage_RESERVE, pbv2.Status_PERMISSION_DENIED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of the handleError
function. I know you didn't introduce it, so this is not your fault :)
What about return an error and / or a status code, and handle it in the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lemme see how that looks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks much better! thanks!
I've left handleError as it is, if you want to remove that I can open a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's do this in a follow-up PR.
c3afa3a
to
bf618ac
Compare
bf618ac
to
6fcf248
Compare
c3409a0
to
9121abd
Compare
Metrics Added: ReservationRequest: Opened, Closed, Renewed ReservationRequestResponseStatus ReservationRejectionReason ConnectionRequest: Opened, Closed ConnectionRequestResponseStatus ConnectionRejectionReason ConnectionDuration BytesTransferred RelayStatus
9121abd
to
e24cfc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"refId": "C" | ||
} | ||
], | ||
"title": "Connection Duration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diagram seems to be missing the unit.
I'm also not sure how useful it is, the graph looks kind of wild on my instance:
Maybe it's easier if we just track the average:
rate(libp2p_relaysvc_connection_duration_seconds_sum[$__range])/rate(libp2p_relaysvc_connection_duration_seconds_count[$__range])
(Please double-check if $__range
is the appropriate variable here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm keeping it range because I think it is most informative, while rate_interval will give you average at that time the graph is very spiky and it is also an average over all connections in that period so the spike doesn't necessarily mean it's something wrong.
On the other hand increasing the dashboard range when using "range" will change this graph which seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed the label to rolling average.
Metrics Added:
ReservationRequest: Opened, Closed, Renewed
ReservationRequestResponseStatus
ReservationRejectionReason
ConnectionRequest: Opened, Closed
ConnectionRequestResponseStatus
ConnectionRejectionReason
ConnectionDuration
BytesTransferred
RelayStatus