Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jaeger-query can't connect to Cassandra running on Azure #812

Closed
jmhon08 opened this issue May 8, 2018 · 12 comments
Closed

jaeger-query can't connect to Cassandra running on Azure #812

jmhon08 opened this issue May 8, 2018 · 12 comments

Comments

@jmhon08
Copy link

jmhon08 commented May 8, 2018

I have Cassandra running on Azure (https://docs.bitnami.com/azure-templates/infrastructure/cassandra/, cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4) and am able to connect to it on my machine using ./cqlsh 104.42.116.80 9042 -u theusername -p thepassword. When I try to connect my jaeger-query to it with

./jaeger-query --query.static-files=/go/jaeger-ui/ --cassandra.servers=104.42.116.80 --cassandra.port=9042 --cassandra.username=theusername --cassandra.password=thepassword

I see

{"level":"info","ts":1525821593.4915025,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":16687,"status":"unavailable"}

But then nothing more appears. It stays like that forever. If I give it a bad username, it gives me an error about "unable to connect to initial hosts: Provided username blahblah and/or password are incorrect" so it seems like it is connecting to Cassandra fine, but not running the agent properly. I can't figure out a way to get more logs to appear to show me what could be wrong (I tried --log-level=debug, but still only see the one log line appear).

Does anyone know what could be going on here?

@jpkrohling
Copy link
Contributor

Unfortunately, looks like I can't get a free Azure account to try this out (I might have had an older, non-free account). Do you see anything on Cassandra logs?

@burmanm, do you know what might be going on here?

@jmhon08
Copy link
Author

jmhon08 commented May 9, 2018

Here are the trace logs when I run the command to start jaeger-query https://gist.github.com/jmhon08/21c510aad318cfeea26e318f2d43cb85. I don't see any errors, but I'm also not that familiar with Cassandra.

I tried to run the jaeger-query command inside the Azure node running Cassandra

./jaeger-query --query.static-files=jaeger-ui-build/build/ --cassandra.servers=0.0.0.0 --cassandra.port=9042 --cassandra.username=theusername --cassandra.password=thepassword

and it worked fine.

@xihw
Copy link

xihw commented May 9, 2018

Actually here are my observations:

  1. Starting jaeger-query from inside of the cluster by using any node's private IP succeeds.
  2. Starting jaeger-query from outside of the cluster by using the public IP still succeeds after hanging for around 40 minutes!

@jmhon08
Copy link
Author

jmhon08 commented May 9, 2018

OK, so the issue is not that it can't connect from my machine, it's that it takes 40 minutes. Does this seem like a Jaeger issue or an Azure issue?? If it's more likely an Azure one, I could open up a ticket with them.

@black-adder
Copy link
Contributor

I've had something very similar happen when I tried hitting our c* cluster from my localhost. However we were never able to reproduce in production. It might have something to do with the gocql driver. Perhaps we could try upgrading it and checking if it has any impact?

@jmhon08
Copy link
Author

jmhon08 commented May 11, 2018

Upgrading gocql to commit version 181004e14a3fb735efcc826a4256369d0c96747b made it able to connect within 10 seconds! Thanks for the tip!!

Note: Upon upgrading and building main.go for query, we ran into an error github.com/jaegertracing/jaeger/pkg/cassandra/config/config.go:136:10: cannot use tls.Config literal (type tls.Config) as type *tls.Config in field value , but we fixed it with the "&" mentioned here #651

@jmhon08 jmhon08 closed this as completed May 11, 2018
@jpkrohling
Copy link
Contributor

Upgrading gocql to commit version 181004e14a3fb735efcc826a4256369d0c96747b made it able to connect within 10 seconds! Thanks for the tip!!

We might want to update this dependency then. Would you mind submitting a PR for this?

@burmanm
Copy link
Contributor

burmanm commented May 15, 2018

There was some recent issue also where @yurishkuro mentioned about updating gocql version. Did that ever happen?

@black-adder
Copy link
Contributor

black-adder commented May 16, 2018

We haven't gotten around to bumping it yet but we're open to it. Just have to ensure that there's no performance regression.

This was referenced May 16, 2018
@carlislk
Copy link

carlislk commented May 21, 2018

Having the same issue as mentioned above in aws with both jaeger-collector and jaeger-query. Connection taking ~50min.

{"level":"info","ts":1526512891.8658774,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"} {"level":"info","ts":1526516036.9510841,"caller":"cassandra/factory.go:92","msg":"Cassandra archive storage configuration is empty, skipping"} {"level":"info","ts":1526516036.9522612,"caller":"static/strategy_store.go:76","msg":"No sampling strategies provided, using defaults"} {"level":"info","ts":1526516036.9523625,"caller":"collector/main.go:140","msg":"Registering metrics handler with HTTP server","route":"/metrics"} {"level":"info","ts":1526516036.9524033,"caller":"collector/main.go:148","msg":"Starting Jaeger Collector HTTP server","http-port":14268} {"level":"info","ts":1526516036.9524224,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}

@jmhon08 Could you expand on your fix to this issue?

Upgrading gocql to commit version 181004e14a3fb735efcc826a4256369d0c96747b made it able to connect within 10 seconds! Thanks for the tip!!

Note: Upon upgrading and building main.go for query, we ran into an error github.com/jaegertracing/jaeger/pkg/cassandra/config/config.go:136:10: cannot use tls.Config literal (type tls.Config) as type *tls.Config in field value, but we fixed it with the "&" mentioned here #651

Running:
docker run --rm -d --name jaeger-collector -p14267:14267 -p14268:14268 -p9411:9411 -e LOG_LEVEL=DEBUG -e CASSANDRA_SERVERS=<server-ip> -e CASSANDRA_PORT=9042 -e CASSANDRA_KEYSPACE=jaeger_v1_local -e CASSANDRA_USERNAME=cassandra -e CASSANDRA_PASSWORD=cassandra -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 jaegertracing/jaeger-collector:latest

docker run -d --name jaeger-query -p 16686:16686 -e CASSANDRA_KEYSPACE=jaeger_v1_local -e CASSANDRA_SERVERS=<server-ip> -e CASSANDRA_PORT=9042 jaegertracing/jaeger-query:latest

@black-adder
Copy link
Contributor

@carlislk The fix is in this ticket: https://github.com/jaegertracing/jaeger/pull/829/files which I'm going to land post haste. I'll see if we can cut a release once it's landed.

@carlislk
Copy link

carlislk commented May 22, 2018

Much appreciated. Thanks for the quick response. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants