Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for distributed tracing #1019

Merged
merged 16 commits into from
Sep 14, 2018

Conversation

aaslamin
Copy link
Contributor

@aaslamin aaslamin commented Sep 1, 2018

Addresses issue: #931

So what does this PR do?

• Adds support for the Jaeger Distributed Tracing System
• Docker setup for loading up Jaeger with Hydra

Notes 🗒:

• The tracer can be configured w/ sampling via config https://www.jaegertracing.io/docs/sampling/
• By using the OpenTracing API, which is vendor neutral, will make adding support for other OpenTracing compliant tracers straightforward.
• This is the first step...the real work is to follow as we will need to span & propagate all teh things! (e.g. db lookups, encryption, calls to the consent app, etc.)

Cool, how can I get this up and running locally? 🎩 💻

I have updated the docker-compose to bring up the Jaeger tracer and added config for the Hydra container to use it as the collector.

• Clone Hydra to your GOPATH
• Checkout to this branch
• Run docker-compose up --build
(Good time to grab a ☕️...)
• Make a request to Hydra now, for example:

docker exec -it hydra_hydra_1 \
    hydra clients create \
    --endpoint http://localhost:4445 \
    --id my-client \
    --secret secret \
    -g client_credentials

• Open up your browser to: http://localhost:16686/search
• From the dropdown, select Hydra
image
• At the bottom, click on Find Traces
• You should be able to see your first trace 🎉:
image

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! This is great, especially for a first take! I added some comments on things.

If this gets merged, I'd actually like to see it in the gang of three (hydra,oathkeeper,keto). Maybe I'll create a repo for a lib for this.

cmd/root.go Outdated
viper.SetDefault("TRACING_PROVIDER", "")

viper.BindEnv("TRACING_PROVIDER_JAEGER_SAMPLING_SERVER_URL")
viper.SetDefault("TRACING_PROVIDER_JAEGER_SAMPLING_SERVER_URL", "http://localhost:5778/sampling")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be empty, we can't assume that this service is running on that port and doing so could even be a security issue (it's a different service/malware not explicitly configured)

cmd/root.go Outdated
viper.SetDefault("TRACING_PROVIDER_JAEGER_SAMPLING_VALUE", float64(1))

viper.BindEnv("TRACING_PROVIDER_JAEGER_LOCAL_AGENT_HOST_PORT")
viper.SetDefault("TRACING_PROVIDER_JAEGER_LOCAL_AGENT_HOST_PORT", "127.0.0.1:6831")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, don't assume default hostnames!

cmd/root.go Outdated
viper.SetDefault("TRACING_PROVIDER_JAEGER_LOCAL_AGENT_HOST_PORT", "127.0.0.1:6831")

viper.BindEnv("TRACING_SERVICE_NAME")
viper.SetDefault("TRACING_SERVICE_NAME", "Hydra")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ORY Hydra

config/config.go Outdated
@@ -187,6 +195,28 @@ func (c *Config) GetLogger() *logrus.Logger {
return c.logger
}

func (c *Config) GetTracer() *tracing.Tracer {
if c.tracer == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any stats on reliability and/or network performance, specifically wrt to memory use under high load? If it's minimal we can have this always enabled. If it isn't, it should be either enabled by default (but possible to disable) or disabled by default (maybe the better option).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current setup, the opentracing.GlobalTracer() will be set to the NoopTracer if one does not provide any tracing config (or if they provide invalid tracing config) when booting up Hydra. It is disabled by default.

I use the IsLoaded helper later to determine as to whether to append the tracer middleware.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, so omitting TRACING_PROVIDER disables tracing?

Copy link
Contributor Author

@aaslamin aaslamin Sep 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly - or if you provide garbage config (I log the error incase they want to debug)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests to cover above.

@@ -80,6 +84,17 @@ services:
- POSTGRES_PASSWORD=secret
- POSTGRES_DB=hydra

jaeger:
image: jaegertracing/all-in-one:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with :latest tags is that, while they do fetch the latest tag, they don't fetch those the next time you're running :latest. This can lead to problems where machine A uses version 1.0 and machine B uses 2.0 although both have the same tag. This can then lead to a ton of confusion and issues. It's much better (and safer) to explicitly define the version.

That being said, I don't think jaeger belongs in this docker-compose file. Instead I'd love to see either a docker command (e.g. docker run --network --rm ..... jeagertracing/all-in-one-latest) that you can just run "on top" of this docker-compose set up, or alternatively add this to one of the ory/examples. If it's the latter, please coordinate with me as I'm currently working on that repository and changing some substantial things.

Copy link
Contributor Author

@aaslamin aaslamin Sep 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! We can include instructions for how to bring up tracing somewhere in the docs?

I will comment out the tracing specific config and remove the jaeger container from the docker-compose file.

Update: actually on second thought, I guess I should remove all of the config for tracing from the default docker-compose file.

next(rw, r)

ext.HTTPMethod.Set(span, r.Method)
if negroniWriter, ok := rw.(negroni.ResponseWriter); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this line, are you sure this is negroni? I think it would be better do define your own interface here or use one from the standard library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure that this is the negroni response writer because of https://github.com/urfave/negroni/blob/master/negroni.go#L96

https://github.com/urfave/negroni/blob/master/response_writer.go#L31

With that said, to be extra defensive, I capture the boolean returned from the type assertion to verify if the ResponseWriter is indeed of type negroini.ResponseWriter. All of this is so we can add the tag in our span for the http status that was returned:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see how this works now. This is a negroni middleware so we're getting the correct RW here. I though it worked differently, my bad :)

}

func (t *Tracer) Setup() {
switch t.Provider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to lowercase this so JAEGER and jaeger are equivalent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍, will update

t.closer = closer
t.tracer = opentracing.GlobalTracer()
t.Logger.Infof("Jaeger tracer configured!")
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should have two cases here:

case "":
// disabled
default:
// fatal because unknown tracer

By explicitly failing on an unknown tracer, we make debugging much easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will update.

Copy link
Contributor Author

@aaslamin aaslamin Sep 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aeneasr should it be a fatal error in this case if someone misconfigured their tracing config?

The rest of Hydra can still run so I am not sure if we want to bring down everything. Maybe logging at error level is more appropriate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should, because it's an obvious mistake so hydra should make it explicit that it's misconfigured. Better to have a service not start than to discover that some feature didn't work all along that you now urgently need to debug something.

@aaslamin
Copy link
Contributor Author

aaslamin commented Sep 2, 2018

Update:

ℹ️ Added some basic unit tests around setting up the tracer

@@ -130,6 +130,11 @@ func setup(c *config.Config, cmd *cobra.Command, args []string, name string) (ha
handler = NewHandler(c, w)
handler.registerRoutes(frontend, backend)
c.ForceHTTP, _ = cmd.Flags().GetBool("dangerous-force-http")
tracer := c.GetTracer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to:

if tracer := c.GetTracer(); tracer.IsLoaded() {
  middlewares = append(middlewares, tracer)
}

@@ -208,6 +213,13 @@ func serve(c *config.Config, cmd *cobra.Command, handler http.Handler, address s
},
})

srv.RegisterOnShutdown(func() {
tracer := c.GetTracer()
if tracer.IsLoaded() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, or depending on what Close() returns (if it returns nothing) maybe:

if tracer := c.GetTracer(); tracer.Loaded() {
  srv.RegisterOnShutdown(tracer.Close)
}

)

if err != nil {
t.Logger.Errorf("Could not initialize jaeger tracer: %s", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be handled "silently", aka no service crash? I'm not sure if it should.

@aaslamin
Copy link
Contributor Author

aaslamin commented Sep 5, 2018

Update:

• Added unit tests for the tracing middleware to ensure the expected number of spans are being created along with the desired tags.

@aaslamin
Copy link
Contributor Author

aaslamin commented Sep 5, 2018

@aeneasr I am ready for another review! 🙏

@aaslamin aaslamin force-pushed the add-support-for-distributed-tracing branch from 193c045 to be6d7c2 Compare September 5, 2018 23:10
Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Only one thing (error ignored) left I think :)

@@ -208,6 +213,10 @@ func serve(c *config.Config, cmd *cobra.Command, handler http.Handler, address s
},
})

if tracer, _ := c.GetTracer(); tracer.IsLoaded() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that there is an error here? If yes, tracer MAY be nil which will cause a panic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, a tracer would have been configured so it would not be possible for it to have an error - recall how we now exit if an error has occurred on boot up.

Although, for safety, incase future refactoring happens by other contributors, it's probably best to do error checking. Will add it in 👍

…to lowercase first. add case for empty string incase.

Signed-off-by: Amir Aslaminejad <[email protected]>
…le as they will be added to the documentation.

Signed-off-by: Amir Aslaminejad <[email protected]>
…at the expected number of spans are created with the appropriate tags.

Signed-off-by: Amir Aslaminejad <[email protected]>
…test to cover scenario where a hydra should continue a trace if one is already present in the request.

Signed-off-by: Amir Aslaminejad <[email protected]>
@aaslamin aaslamin force-pushed the add-support-for-distributed-tracing branch from be6d7c2 to 90847ef Compare September 9, 2018 23:50
@aaslamin
Copy link
Contributor Author

@aeneasr addressed latest feedback re: error handling 👍

@aeneasr
Copy link
Member

aeneasr commented Sep 14, 2018

Thank you for your hard work!

@aeneasr aeneasr merged commit 1cd4d17 into ory:master Sep 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants