Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[checkoutservice] add basic metric support #339

Conversation

fatsheep9146
Copy link
Contributor

@fatsheep9146 fatsheep9146 commented Aug 23, 2022

Signed-off-by: Ziqi Zhao [email protected]

related #69.

Changes

Try to enhance checkoutservice with metric support.

Add basic runtime metrics for checkoutservice, using runtime metric instrumentation library.

Here is the metrics example, process_runtime_go_goroutines

image


Following paragraph is the earlier content of this pull request, but these efforts are blocked by open-telemetry/opentelemetry-go-contrib#2700, so remove them for now.

According to https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/rpc.md and as the first try, I first add a metrics 'rpc.server.duration' for unary service.

the metrics with four label
- rpc.system
- rpc.service
- rpc.method
- rpc.grpc.status_code

I made some experiments with new metric, I stop 'shipservice' manually to mock the fail down of shipservice and restart it after a few minutes, and draw a graph indicates the success ratio of checkout service.

the query I used is sum(rate(rpc_server_duration_count{rpc_grpc_status_code="0"}[1m])) / sum(rate(rpc_server_duration_count[1m]))

and the graph is like following with a drop in the middle

@fatsheep9146 fatsheep9146 requested a review from a team August 23, 2022 14:03
reyang
reyang previously requested changes Aug 23, 2022
Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link

github-actions bot commented Sep 1, 2022

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@fatsheep9146
Copy link
Contributor Author

fatsheep9146 commented Sep 1, 2022

#339 (comment)

open-telemetry/opentelemetry-go-contrib#2700

@reyang @austinlparker

I also started to add metrics support for grpc instrumentation library. Should we wait until this pr merged and then update checkout service's library?

@github-actions
Copy link

github-actions bot commented Sep 9, 2022

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 9, 2022
@fatsheep9146
Copy link
Contributor Author

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Related pull requests open-telemetry/opentelemetry-go-contrib#2700 are still being reviewed, not stale.

@fatsheep9146 fatsheep9146 removed the Stale label Sep 9, 2022
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 26, 2022
@reyang reyang removed the Stale label Sep 27, 2022
@cartersocha
Copy link
Contributor

hey @fatsheep9146 lets try to push this forward for the hero scenario asap. Our development freeze starts October 10th.

All we need in the checkout service is a gauge on the cache size + instrumentation libraries

@cartersocha
Copy link
Contributor

@austinlparker, @julianocosta89, @puckpuck fyi lets try to push this forward

@fatsheep9146 fatsheep9146 force-pushed the checkoutservice-metrics-enhancement branch from 770cd2f to b4b653b Compare September 30, 2022 09:31
@fatsheep9146
Copy link
Contributor Author

I refract this pr to use the new version of golang metric sdk and using runtime instrumentation library. Please help review this =D
@cartersocha @austinlparker @puckpuck @julianocosta89

@cartersocha cartersocha requested a review from reyang September 30, 2022 17:14
@cartersocha cartersocha dismissed reyang’s stale review September 30, 2022 17:16

PR has been pivoted and needs a fresh review

Copy link
Member

@julianocosta89 julianocosta89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to test this out.

Building the service is failing with the following message:

=> ERROR [builder 10/10] RUN go build -o /go/bin/checkoutservice/ ./                                                                    6.1s 
------                                                                                                                                        
 > [builder 10/10] RUN go build -o /go/bin/checkoutservice/ ./:                                                                               
#0 3.911 # go.opentelemetry.io/otel/sdk/metric/metricdata                                                                                     
#0 3.911 /go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/metricdata/data.go:62:14: syntax error: unexpected int64, expecting ]        
#0 3.911 /go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/metricdata/data.go:67:15: syntax error: unexpected ), expecting type         
#0 3.911 /go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/metricdata/data.go:70:12: syntax error: unexpected int64, expecting ]        
#0 3.911 /go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/metricdata/data.go:80:13: syntax error: unexpected ), expecting type
#0 3.911 /go/pkg/mod/go.opentelemetry.io/otel/sdk/[email protected]/metricdata/data.go:83:18: syntax error: unexpected int64, expecting ]
#0 3.911 note: module requires Go 1.18
------
failed to solve: executor failed running [/bin/sh -c go build -o /go/bin/checkoutservice/ ./]: exit code: 2

I've tried updating the go version within the Dockerfileand go.mod, but then the build failed with the following:

=> ERROR [builder  9/10] RUN protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./                                         0.4s 
------                                                                                                                                        
 > [builder  9/10] RUN protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./:
#0 0.412 protoc-gen-go: program not found or is not executable
#0 0.412 Please specify a program using absolute path or make sure the program is available in your PATH system variable
#0 0.412 --go_out: protoc-gen-go: Plugin failed with status code 1.
------
failed to solve: executor failed running [/bin/sh -c protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./]: exit code: 1

@austinlparker
Copy link
Member

I'm investigating this - looks like part of it is that the go version in the container isn't 1.18+

@julianocosta89
Copy link
Member

@austinlparker yes, but when I tried updating the go version within the Dockerfile and go.mod, the build failed with the following:

=> ERROR [builder  9/10] RUN protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./                                                       0.6s
------
 > [builder  9/10] RUN protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./:
#0 0.552 protoc-gen-go: program not found or is not executable
#0 0.552 Please specify a program using absolute path or make sure the program is available in your PATH system variable
#0 0.553 --go_out: protoc-gen-go: Plugin failed with status code 1.
------
failed to solve: executor failed running [/bin/sh -c protoc -I ./proto/ ./proto/demo.proto --go_out=./ --go-grpc_out=./]: exit code: 1

@austinlparker
Copy link
Member

yeah, I think we've encountered a dependency hell loop. we may have to update our go stuff to use google.golang.org/protobuf rather than protoc-gen-go so we can break into 1.18+... lemme see if that's an easy upgrade

@austinlparker
Copy link
Member

@fatsheep9146 it looks like in order to take newer versions of otel-go we will need to upgrade all go services to support 1.18+. I believe we will also need to update to the newer protobuf package. Can you take care of that for this PR as well?

@fatsheep9146
Copy link
Contributor Author

@fatsheep9146 it looks like in order to take newer versions of otel-go we will need to upgrade all go services to support 1.18+. I believe we will also need to update to the newer protobuf package. Can you take care of that for this PR as well?

Yes, I already found this problem and started doing it. I also found that in 1.18, go get can would not install the package, so the dockerfile should also be updated. @austinlparker

@austinlparker
Copy link
Member

@fatsheep9146 it looks like in order to take newer versions of otel-go we will need to upgrade all go services to support 1.18+. I believe we will also need to update to the newer protobuf package. Can you take care of that for this PR as well?

Yes, I already found this problem and started doing it. I also found that in 1.18, go get can would not install the package, so the dockerfile should also be updated. @austinlparker

Yeah - I would just upgrade to 1.19 across the board, as well as switch to the newer Google protobuf package. Maybe best to do that in a new PR just for cleanliness sake.

@fatsheep9146
Copy link
Contributor Author

@fatsheep9146 it looks like in order to take newer versions of otel-go we will need to upgrade all go services to support 1.18+. I believe we will also need to update to the newer protobuf package. Can you take care of that for this PR as well?

Yes, I already found this problem and started doing it. I also found that in 1.18, go get can would not install the package, so the dockerfile should also be updated. @austinlparker

Yeah - I would just upgrade to 1.19 across the board, as well as switch to the newer Google protobuf package. Maybe best to do that in a new PR just for cleanliness sake.

Ok, I will first submit a new pull request to update the dockerfile and protobuf version. After that pr is merged I will rebase this pr then.

@julianocosta89
Copy link
Member

Hello all,

I've just sent a PR with the required update: #406

Once, this PR is merged we can simply rebase and merge this one.
Already tested locally and it is working 🥳

@cartersocha
Copy link
Contributor

cartersocha commented Oct 7, 2022

@fatsheep9146 the go version is now updated! thanks @julianocosta89

@fatsheep9146
Copy link
Contributor Author

@fatsheep9146 the go version is now updated! thanks @julianocosta89

I also tested it locally, it worked, thanks! @cartersocha @julianocosta89

Copy link
Member

@julianocosta89 julianocosta89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@julianocosta89
Copy link
Member

@reyang anything missing in here?

@reyang
Copy link
Member

reyang commented Oct 7, 2022

@reyang anything missing in here?

Let me take a quick look.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@reyang
Copy link
Member

reyang commented Oct 7, 2022

I don't know if this needs a changelog entry or not. Not a blocker though.

@cartersocha
Copy link
Contributor

I'll cover the changelog update in a separate PR to move this forward

@cartersocha cartersocha merged commit ca30e16 into open-telemetry:main Oct 7, 2022
jmichalak9 pushed a commit to jmichalak9/opentelemetry-demo that referenced this pull request Mar 22, 2024
Signed-off-by: Ziqi Zhao <[email protected]>

Signed-off-by: Ziqi Zhao <[email protected]>
Co-authored-by: Carter Socha <[email protected]>
Co-authored-by: Juliano Costa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants