Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding node software version info to Ping metrics. #662

Merged
merged 16 commits into from
Apr 29, 2021

Conversation

vishalchangrani
Copy link
Contributor

@vishalchangrani vishalchangrani commented Apr 26, 2021

Currently, the Ping metrics reports the RTT it takes to ping a remote node and the operator name as part of node info.
This change adds another label - nodelabel to the Ping metrics to report the node software version that each of the node in the Flow network is running.

The current Ping implementation is using the Libp2p provided Ping functionality which is supported via the special Ping protocol - https://docs.libp2p.io/concepts/protocols/#ping. However, this protcol cannot be extended to add in additional information. The other protocols such as identity is also not extensible and mostly designed for libp2p core functionality.

Hence, to support the version and also the block height later, I created something on the same lines as the libp2p Ping protocol calling it the Flow Ping protocol. The Pings are exchanged on the a special protocol ID /flow/ping/<root block id> and are handled by the libp2pNode.go file.

This is how it looks on localnet - notice the additional label - nodeversion

Screen Shot 2021-04-27 at 11 14 31 PM

@@ -226,62 +228,62 @@ docker-ci-benchmark-team-city:

.PHONY: docker-build-collection
docker-build-collection:
docker build -f cmd/Dockerfile --build-arg TARGET=collection --target production \
docker build -f cmd/Dockerfile --build-arg TARGET=collection --build-arg COMMIT=$(COMMIT) --build-arg VERSION=$(IMAGE_TAG) --target production \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The software version is passed in as linker params (ld flags). The Image_Tag will be set to something like - v0.16.0-patch1. This will be the software version reported by the Ping metric.

@@ -7,7 +7,9 @@ COMMIT := $(shell git rev-parse HEAD)
VERSION := $(shell git describe --tags --abbrev=0 --exact-match 2>/dev/null)

# Image tag: if image tag is not set, set it with version (or short commit if empty)
ifeq (${IMAGE_TAG},)
Copy link
Contributor Author

@vishalchangrani vishalchangrani Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this, the IMAGE_TAG even if specified is always set to the version

-t "$(CONTAINER_REGISTRY)/collection:latest" -t "$(CONTAINER_REGISTRY)/collection:$(SHORT_COMMIT)" -t "$(CONTAINER_REGISTRY)/collection:$(IMAGE_TAG)" .

.PHONY: docker-build-collection-debug
docker-build-collection-debug:
docker build -f cmd/Dockerfile --build-arg TARGET=collection --target debug \
docker build -f cmd/Dockerfile --build-arg TARGET=collection --build-arg COMMIT=$(COMMIT) --build-arg VERSION=$(IMAGE_TAG) --target debug \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker-build- commands for each of the node type now pass-in two params to the dockerfile - COMMIT and VERSION. The docker file then uses is to set the LD flags when building those images.

cmd/Dockerfile Outdated
@@ -46,7 +46,9 @@ WORKDIR /app
# https://github.com/golang/go/issues/27719#issuecomment-514747274
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
GO111MODULE=on CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build --tags "relic,netgo" -ldflags "-extldflags -static" -o ./app ./cmd/${TARGET}
GO111MODULE=on CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build --tags "relic,netgo" -ldflags "-extldflags -static \
-X 'github.com/onflow/flow-go/cmd/bootstrap/build.commit=${COMMIT}' -X 'github.com/onflow/flow-go/cmd/bootstrap/build.semver=${VERSION}'" \
Copy link
Contributor Author

@vishalchangrani vishalchangrani Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the build.semver and build.commit. build.semver will be something like v0.16.1-patch1 while build.commit will be the short commit.
Build.commit is currently not being used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.com/onflow/flow-go/cmd/bootstrap/build is probably not the best place for this to live, now that it's used across multiple binaries. Could we move this to eg cmd/build or utils/build?

@@ -41,3 +41,11 @@ func init() {
commit = undefined
}
}

func SetSemver(sv string) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two setters to make testing easy (without using ldflags)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much rather simplify testing another way than by adding these setters. Having these makes it too easy to set the wrong version globally.

Instead, what about injecting a function to PingService to get the version? It is likely we will want to take a similar approach anyway to include the block height.

@@ -109,7 +109,7 @@ func (e *Engine) pingNode(peer *flow.Identity) {
id := peer.ID()

// ping the node
rtt, err := e.middleware.Ping(id) // ping will timeout in libp2p.PingTimeoutSecs seconds
resp, rtt, err := e.middleware.Ping(id) // ping will timeout in libp2p.PingTimeoutSecs seconds
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping now returns the response which contains the version field.

@@ -349,5 +349,6 @@ type TransactionMetrics interface {
type PingMetrics interface {
// NodeReachable tracks the round trip time in milliseconds taken to ping a node
// The nodeInfo provides additional information about the node such as the name of the node operator
NodeReachable(node *flow.Identity, nodeInfo string, rtt time.Duration)
// version is the software version the target node is running
NodeReachable(node *flow.Identity, nodeInfo string, rtt time.Duration, version string)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new label - version

@@ -0,0 +1,9 @@
# go get github.com/gogo/protobuf/protoc-gen-gofast
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makefile to regenerate networking protobufs

@@ -0,0 +1,174 @@
package p2p

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loosely based on how libp2p implements it's Ping protocol - https://github.com/libp2p/go-libp2p/blob/master/p2p/protocol/ping/ping.go

}

// PingHandler receives the inbound stream for Flow ping protocol and respond back with the PingResponse message
func (ps *PingService) PingHandler(s network.Stream) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server function that responds to the ping request

}

// Ping sends a Ping request to the remote node and returns the response, rtt and error if any.
func (ps *PingService) Ping(ctx context.Context, p peer.ID) (message.PingResponse, time.Duration, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client function that sends a ping request

@@ -113,6 +114,9 @@ func NewLibP2PNode(logger zerolog.Logger,
return nil, fmt.Errorf("could not bootstrap libp2p host: %w", err)
}

pingLibP2PProtocolID := generateProtocolID(rootBlockID)
pingService := NewPingService(libP2PHost, pingLibP2PProtocolID, logger)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create the PingService to handle incoming ping request and to send ping request to remote nodes

Copy link
Member

@zhangchiqing zhangchiqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I learned something from reviewing this PR. Thanks for adding the comments to explain the changes. 👏

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Awesome, thanks for setting this up!

Added some suggestions, I also think we should include the semver and commit in the startup log message.

cmd/Dockerfile Outdated
@@ -46,7 +46,9 @@ WORKDIR /app
# https://github.com/golang/go/issues/27719#issuecomment-514747274
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
GO111MODULE=on CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build --tags "relic,netgo" -ldflags "-extldflags -static" -o ./app ./cmd/${TARGET}
GO111MODULE=on CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build --tags "relic,netgo" -ldflags "-extldflags -static \
-X 'github.com/onflow/flow-go/cmd/bootstrap/build.commit=${COMMIT}' -X 'github.com/onflow/flow-go/cmd/bootstrap/build.semver=${VERSION}'" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.com/onflow/flow-go/cmd/bootstrap/build is probably not the best place for this to live, now that it's used across multiple binaries. Could we move this to eg cmd/build or utils/build?

Comment on lines 8 to 9
message PingResponse {
string version = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message PingResponse {
string version = 1;
message PingResponse {
string version = 1; // node software version

@@ -163,6 +163,10 @@ func generateProtocolID(rootBlockID string) protocol.ID {
return protocol.ID(FlowLibP2PProtocolIDPrefix + rootBlockID)
}

func generateID(rootBlockID string) protocol.ID {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func generateID(rootBlockID string) protocol.ID {
func generatePingID(rootBlockID string) protocol.ID {

Is this only for ping messages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah only for Ping message. I believe my earlier name change got lost somehow. I have added it back.

case err, ok := <-errCh:
// if and error occur while reposning to a pind request, then reset the stream
if ok {
log.Error().Err(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Error().Err(err)
log.Error().Err(err).Msg("error responding to ping request")

I don't think this line will print anything as-is, believe it needs the Msg call as well

@@ -41,3 +41,11 @@ func init() {
commit = undefined
}
}

func SetSemver(sv string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much rather simplify testing another way than by adding these setters. Having these makes it too easy to set the wrong version globally.

Instead, what about injecting a function to PingService to get the version? It is likely we will want to take a similar approach anyway to include the block height.

Comment on lines 25 to 35
type PingService struct {
host host.Host
logger zerolog.Logger
pingProtocolID protocol.ID
}

func NewPingService(h host.Host, pingProtocolID protocol.ID, logger zerolog.Logger) *PingService {
ps := &PingService{host: h, logger: logger, pingProtocolID: pingProtocolID}
h.SetStreamHandler(pingProtocolID, ps.PingHandler)
return ps
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for a different approach than SetSemver:

Suggested change
type PingService struct {
host host.Host
logger zerolog.Logger
pingProtocolID protocol.ID
}
func NewPingService(h host.Host, pingProtocolID protocol.ID, logger zerolog.Logger) *PingService {
ps := &PingService{host: h, logger: logger, pingProtocolID: pingProtocolID}
h.SetStreamHandler(pingProtocolID, ps.PingHandler)
return ps
}
type PingService struct {
host host.Host
logger zerolog.Logger
getVersion func() string
pingProtocolID protocol.ID
}
func NewPingService(h host.Host, pingProtocolID protocol.ID, logger zerolog.Logger, getVersion func() string) *PingService {
ps := &PingService{host: h, logger: logger, pingProtocolID: pingProtocolID, getVersion: getVersion}
h.SetStreamHandler(pingProtocolID, ps.PingHandler)
return ps
}

Then we can do:

pingService := NewPingService(..., build.Semver)

And for testing something like:

pingService := NewPingService(..., func() {return "test string"})

Using the function argument directly is simplest and probably fine for now, but if we're adding block height and potentially other things as well, it may be cleaner to create an interface for PingInfoProvider

type interface PingInfoProvider {
      SoftwareVersion() string
      LatestFinalizedBlockHeight() uint64
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. for now I am just passing in the semver value as a string to the NewPingService function since the software version won't change till the node is restarted.
I will define the interface once I start implementing the finalized block height changes.

Copy link
Contributor Author

@vishalchangrani vishalchangrani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I learned something from reviewing this PR. Thanks for adding the comments to explain the changes. 👏

Thanks @zhangchiqing for taking the time. Appreciate it.

)

// the Flow Ping protocol prefix
const FlowLibP2PPingPrefix = "/flow/ping/"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lib2p uses a concept of Protocols to allow discrete one-to-one communication streams between two nodes. So far all the Flow direct communication has been using the Flow Libp2p protocol ID - /flow/push/<root block ID>.

Here, a new protocol ID specifically for Ping is being defined. Similar to https://docs.libp2p.io/concepts/protocols/#ping.

Why a new Protocol ID and why not reuse the old /flow/push protocol ID?

  1. Mainly, to keep core protocol network traffic separate from non-core protocol traffic.
  2. Mimicing, what libp2p themselves do when it comes to ping.
  3. Allows a request-response type of communication not available with the /flow/push protocol.

"github.com/onflow/flow-go/cmd/bootstrap/build"
)

// TestPing tests PingService by creating two libp2p hosts and ping each one from the other
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// SetStreamHandler sets the stream handler of libp2p host of the node.
func (n *Node) SetStreamHandler(handler libp2pnet.StreamHandler) {
// SetFlowProtocolStreamHandler sets the stream handler of Flow libp2p Protocol
func (n *Node) SetFlowProtocolStreamHandler(handler libp2pnet.StreamHandler) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename

go.mod Outdated
@@ -33,6 +33,7 @@ require (
github.com/libp2p/go-libp2p-pubsub v0.4.1
github.com/libp2p/go-libp2p-swarm v0.4.0
github.com/libp2p/go-libp2p-transport-upgrader v0.4.0
github.com/libp2p/go-msgio v0.0.6
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package that helps read and write length-delimited slices (https://github.com/libp2p/go-msgio). Use to read and write the ping request and response.


message PingResponse {
string version = 1;
uint64 blockHeight = 2;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockHeight not being populated for now.

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 LGTM

Could we also add the version to a startup log (ie. in scaffold) so operators have a way to access this easily as well?

@vishalchangrani
Copy link
Contributor Author

💯 LGTM

Could we also add the version to a startup log (ie. in scaffold) so operators have a way to access this easily as well?

Added statements to print the version and the commit id at the node startup in scaffold.go.

// create a new stream to the remote node
s, err := ps.host.NewStream(ctx, p, ps.pingProtocolID)
if err != nil {
return message.PingResponse{}, -1, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return message.PingResponse{}, -1, err
return message.PingResponse{}, -1, fmt.Errorf("could not create stream: %w", err)


if err != nil {
t.Fatal(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may use require.NoError(t, err).

log.Error().Msg("context timed out on ping to remote node")
case <-done:
}
err := s.Reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions about this:

  • Assume that our context times out and we reach here. Then does it mean that we have only one single shot to reset the stream? why only a single shot?
  • Why we do reset the stream upon completion of ping?

Copy link
Contributor

@yhassanzadeh13 yhassanzadeh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just a few minor comments/questions.

}

// if ping succeeded, close the stream else reset the stream
go func() {
Copy link
Contributor Author

@vishalchangrani vishalchangrani Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yhassanzadeh13 - thanks for your question earlier about why Reset the stream. I got the reset statement from the libp2p's Ping logic. But after I read your question, I did some investigation and changed the code accordingly.
Libp2p's stream.Close() indeed closes the stream gracefully from both ends - See this issue.
In a nut shell, if you would like to cause an error on the remote node while closing the stream, use stream.Reset() else use stream.Close().
I have modified the code to do that - if the client times out, do a reset and let the remote know that something bad has happened else just close.

@vishalchangrani vishalchangrani merged commit 80dcf86 into master Apr 29, 2021
@vishalchangrani vishalchangrani deleted the vishal/ping_version_info branch April 29, 2021 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants