Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jaeger-collector dropping spans & traces #1411

Closed
azuretek opened this issue Mar 8, 2019 · 3 comments
Closed

jaeger-collector dropping spans & traces #1411

azuretek opened this issue Mar 8, 2019 · 3 comments

Comments

@azuretek
Copy link

azuretek commented Mar 8, 2019

Requirement - what kind of business use case are you trying to solve?

Using jaeger-collector to store spans & traces in kafka, and then jaeger-ingester to store in cassandra.

Problem - what in Jaeger blocks you from solving the requirement?

I'm running at least 6 jaeger-collector instances and after a seemingly random amount of time (in this case 7~ hours) they start dropping what seems like the majority of spans and traces.

I'm not seeing anything in the logs that indicate an issue producing/communicating to kafka, in fact it looks like all logging stops on all my pods.

I've configured the collectors with these settings

collector:
    Image:       jaegertracing/jaeger-collector:1.9.0
    Ports:       14267/TCP, 14268/TCP, 9411/TCP
    Args:
      --collector.port=14267
      --collector.http-port=14268
      --collector.zipkin.http-port=9411
      --kafka.brokers=kafka-0.broker:9092,kafka-1.broker:9092,kafka-2.broker:9092
      --kafka.encoding=protobuf
      --kafka.topic=traces

This is what our prometheus metrics look like when the problem is happening. The spans were dropping until I restarted all the jaeger-collector pods, then ~7 hours later we see the problem again and it resolves again when I restart the pods.

screen shot 2019-03-08 at 12 30 10 am

@yurishkuro
Copy link
Member

There should be metrics from Kafka writer, I would check if you started getting write errors or latency.

@azuretek
Copy link
Author

azuretek commented Mar 8, 2019

This is what the two graphs look like next to eachother.

screen shot 2019-03-08 at 9 47 32 am

Showing the rate of successes according to the collector
screen shot 2019-03-08 at 10 05 30 am

I can't enable ingester metrics due to this issue: #1200
But it resolves when I restart the collectors, the ingesters seem to keep chugging along just fine.

@yurishkuro
Copy link
Member

old issue, haven't come up in 5yrs, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants