-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add healthcheck to standalone #784
Add healthcheck to standalone #784
Conversation
Signed-off-by: Eundoo Song <[email protected]>
Signed-off-by: Eundoo Song <[email protected]>
Signed-off-by: Eundoo Song <[email protected]>
So this is actually adding two independent health check endpoints, for collector and query. Could you describe in the PR why this is needed, how it can be used? Our objective is to only add features that are actually useful to people (and documented!). Thanks. |
I think it would be better for people who want to use health check in the same way as separated one. Thanks. |
But how would you use it? E.g. our internal deployment system and discovery system (part of routing) only support one health endpoint per service. I am not suggesting a separate endpoint (we serve metrics from one port only, healthcheck would be on the same port), but I think there should be only one health status for the binary, not two that are potentially conflicting. |
+1, from an operator's perspective, it makes more sense to have one endpoint that reports the health of the system in general than one health check endpoint for each component. |
I don't get this, does this mean healthcheck is using the same port as metrics? I agree that one endpoint of the health check per service(binary) makes more sense.
I think the implementation will not be many changes, but I'm wondering which port number can be used. |
You don't need a new port, all-in-one already opens several HTTP ports, you can just add an endpoint to one of them. E.g. we define ports for the |
…jaeger into healcheck_standalone
Signed-off-by: eundoo-song <[email protected]>
Your suggestion looks good. But since health check is using its own HTTP server(with ServeMux) and the other service(collector, query) is using Gorilla Mux, it's a bit difficult for health check to use other Mux without change in health check. I did implement (eundoosong@f6a8f5d) but IMO it doesn't look better in term of design. So I decided collector port for health check(port num is 14269) is used for standalone health check port. |
cmd/standalone/main.go
Outdated
func(flagSet *flag.FlagSet) { | ||
flagSet.Int(healthCheckHTTPPort, defaultHealthCheckPort, | ||
"The http port for the health check services") | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health-check for standalone is implemented inside main because there is only one flag existed in standalone.
cmd/standalone/main.go
Outdated
|
||
if err := command.Execute(); err != nil { | ||
fmt.Println(err.Error()) | ||
os.Exit(1) | ||
} | ||
} | ||
|
||
func hideQueryCollectorFlags(command *cobra.Command) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hides query and collector health check flags.
collectorHTTPPort = "collector.http-port" | ||
collectorZipkinHTTPort = "collector.zipkin.http-port" | ||
// CollectorHealthCheckHTTPPort is the flag name for health check | ||
CollectorHealthCheckHTTPPort = "collector.health-check-http-port" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better approach would be to have a single global flag for all binaries, similar to how we do log levels or metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions,
- Does this mean we need to take that flag out from collector flags and make a global flag?
similar to how we do log levels or metrics.
- Are you talking about BasicOptions(cmd/builder/builder_options.go)? How about SharedFlags(cmd/flags/..)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 - yes
2 - I meant shared flags, the BasicOption is independent of cli flags
Signed-off-by: Eundoo Song <[email protected]>
cmd/flags/flags.go
Outdated
// AddFlags adds flags for SharedFlags | ||
func AddFlags(flagSet *flag.FlagSet) { | ||
flagSet.String(spanStorageType, "", fmt.Sprintf("Deprecated; please use %s environment variable", storage.SpanStorageTypeEnvVar)) | ||
flagSet.String(logLevel, "info", "Minimal allowed log Level. For more levels see https://github.com/uber-go/zap") | ||
flagSet.Int(healthCheckHTTPPort, 0, "The http port for the health check service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default port is removed since it's difficult to pass in the current interface. So I let NewHealthCheck set it when each component initializes health check.
Do we need to add a default port in help message? Maybe like below..
flagSet.Int(healthCheckHTTPPort, 0, "The http port for the health check service. For default port, see https://https://www.jaegertracing.io/XXXXX")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better of the default port was printed by help. One option to do that is to define it in a global variable, e.g. defaulted to collector port, and then query service main() can override that var at the beginning (before registering flags)
Signed-off-by: Eundoo Song <[email protected]>
startQuery(qOpts, spanReader, dependencyReader, logger, metricsFactory, mBldr) | ||
startCollector(cOpts, spanWriter, logger, metricsFactory, samplingHandler, hc) | ||
startQuery(qOpts, spanReader, dependencyReader, logger, metricsFactory, mBldr, hc) | ||
hc.Ready() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily know how to fix it, but there is a race condition here where collector or query background goroutines can fail to start the server and set HC to Unavailable, but this statement will override it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside from race issue in all-in-one, lgtm
a0d42ad
to
64b9368
Compare
cmd/standalone/main.go
Outdated
logger.Info("Starting jaeger-query HTTP server", zap.Int("port", qOpts.Port)) | ||
go collectorServer() | ||
go queryServer() | ||
hc.Ready() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move goroutines to main from each start function, and start them in main then set Ready.
IMO, this will prevent the race issue.
For 100% guarantees, can we move hc.Ready before goroutines like this?
hc.Ready()
go collectorServer()
go queryServer()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think returning functions like this provides any additional guarantees. I think it's fine to leave it as it was, i.e. call startCollector()
, it might fail mid-way and crash the program, or it can spawn goroutine for ListenAndServe, which can fail as well and crush the program later. So there will be a possible time interval where HC==Ready, and then a crash, but I don't think it's critical for all-in-one.
Sorry I led you on the wrong path with the last comment. The fact that the goroutines call log.Fatal on error sort of take care of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted hc.Ready() to be called before collector/query goroutine call hc.Set(healthcheck.Unavailable) in ListenAndServe error.
I think this would prevent overriding Unavailable by hc.Ready() as you mentioned above coment.
Anyway, if the issue in all-in-one is not critical and fatal log on error is enough, I'll revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't think it's critical sice even if the override happens the process will exit anyway.
64b9368
to
466f7b7
Compare
Reverted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* master: (38 commits) Preparing release 1.5.0 (jaegertracing#847) Add bounds to memory storage (jaegertracing#845) Add metric for debug traces (jaegertracing#796) Change metrics naming scheme (jaegertracing#776) Bump gocql version (jaegertracing#829) Remove ParentSpanID from domain model (jaegertracing#831) Make gas run quiet (jaegertracing#838) Revert "Make gas run quite" Revert "Install gas from install-ci" Install gas from install-ci Make gas run quite Add 'gas' for security problems scanning (jaegertracing#830) Add ability to adjust static sampling probabilities per operation (jaegertracing#827) Support log-level flag on agent (jaegertracing#828) Remove unused function (jaegertracing#822) Add healthcheck to standalone (jaegertracing#784) Do not use KeyValue fields directly and use KeyValues as decorator only (jaegertracing#810) Add ContaAzul to the adopters list (jaegertracing#806) Add ISSUE_TEMPLATE and PULL_REQUEST_TEMPLATE (jaegertracing#805) Upgrade to go 1.10 (jaegertracing#792) ... # Conflicts: # cmd/agent/app/builder.go # cmd/collector/main.go # cmd/query/main.go # cmd/standalone/main.go
healthcheck is added in the same way as collect and query since it is missed in standalone.
Related issue : #775
Signed-off-by: Eundoo Song [email protected]