Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we ignore spans for specific operations? #814

Closed
jmhon08 opened this issue May 10, 2018 · 32 comments · Fixed by #827
Closed

Can we ignore spans for specific operations? #814

jmhon08 opened this issue May 10, 2018 · 32 comments · Fixed by #827
Labels
enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@jmhon08
Copy link

jmhon08 commented May 10, 2018

We do health checks on all our services by having Marathon make a request to GET /external_ping every 5 seconds. These health checks appear as spans in Jaeger and it's quite noisy. Is there a way to avoid have the collector not store these?

screen shot 2018-05-10 at 10 27 25 am

@black-adder
Copy link
Contributor

black-adder commented May 10, 2018

Unfortunately, this isn't supported directly but there are ways around it. You could add a new preProcessor https://github.com/jaegertracing/jaeger/blob/master/cmd/collector/app/span_processor.go#L32 that filters those spans via some string matching. Alternative is to wait for adaptive sampling which will allow you to customize sampling rates per operation rather than per service. I'll try to get adaptive sampling OSS soon (maybe a month)

@yurishkuro
Copy link
Member

The url /external_ping does not exist, so I assume UI's index.html will be returned instead. A better way would be for your infra to ping /health endpoint that specifically exists for that purpose. If /health endpoint is also traced then we should fix that - I think it's best to disable tracing on both /health and /metrics endpoints.

@jmhon08
Copy link
Author

jmhon08 commented May 10, 2018

@yurishkuro we are using an internal health check bundle with the customized path "/external_ping". When I change the health check to ping /health, it still creates a span (see image). Our services are Dropwizard, if that matters. The adaptive sampling per operation sounds like exactly what we want so we will just wait for that. Thanks

screen shot 2018-05-10 at 11 40 23 am

@jmhon08 jmhon08 closed this as completed May 10, 2018
@yurishkuro
Copy link
Member

So it's already possible to configure sampling strategy via static config for jaeger-query service and /health endpoint to be 0. But I think it's a roundabout way to go about it, we should simply fix the code to not enable tracing on /health in the first place. I am going to reopen this as an enhancement request.

@yurishkuro yurishkuro reopened this May 10, 2018
@yurishkuro yurishkuro added enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement labels May 10, 2018
@jmhon08
Copy link
Author

jmhon08 commented May 11, 2018

Wait so there's already a way to update our --sampling.strategies-file so that we can filter out a specific endpoint? Is there documentation for this?

@yurishkuro
Copy link
Member

https://www.jaegertracing.io/docs/sampling/#CollectorSamplingConfiguration

@black-adder
Copy link
Contributor

Correction: the static config is still per service level, but it shouldn't be hard to make it per operation.

@saivishalvangala
Copy link

Hi, @yurishkuro @black-adder. I am trying to sample Jaeger at the collector level. I followed the information given in the Jaeger documentation https://www.jaegertracing.io/docs/1.18/sampling/#collector-sampling-configuration .
The below configurations are given in YAML file of Jaeger deployment:

 containers:
        - name: jaeger
          image: 'jaegertracing/all-in-one:1.19.2'
          args:
            - '--query.ui-config=/etc/config/ui.json'
            - '--sampling.strategies-file=/etc/jaeger/sampling/sampling.json'

Below is my sampling.json file:

{
  "service_strategies": [
    {
      "service": "istio",
	  "type": "probabilistic",
      "param": 0,
      "operation_strategies": [
        {
          "operation": "postJson",
          "type": "probabilistic",
          "param": 0
        },
        {
          "operation": "/api/topic/post/xml",
          "type": "probabilistic",
          "param": 0
        }
      ]
    },
    {
      "service": "bar",
      "type": "ratelimiting",
      "param": 5
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.5,
    "operation_strategies": [
      {
        "operation": "/health",
        "type": "probabilistic",
        "param": 0.0
      },
      {
        "operation": "/metrics",
        "type": "probabilistic",
        "param": 0.0
      }
    ]
  }
}

image

I tried different ways to restrict the certain operation of service "istio" like:-

  1. Giving api endpoint in operation.
  2. Giving operation name "postJson".
    At last, I tried to restrict all the traces of service "istio" by giving param: 0. But none of the above ones worked for me. Only client level sampling configurations which i passed while creating Tracer bean is working. Collector level sampling configurations are not working.

Below is how I am creating the Tracer Bean:

@Bean
	public io.opentracing.Tracer jaegerTracer() throws OGSGeneralException
	{
			}
			final Configuration.SamplerConfiguration samplerConfig = Configuration.SamplerConfiguration.fromEnv()
					.withType("const").withParam(1);
			final Configuration.ReporterConfiguration reporterConfig = Configuration.ReporterConfiguration.fromEnv()
					.withLogSpans(true);
			final Configuration config = new Configuration(applicationName).withSampler(samplerConfig)
					.withReporter(reporterConfig);			
			return config.getTracer();
	}

Could anyone please help me here to achieve adaptive sampling at the collector level? It is a very critical issue.

@yurishkuro
Copy link
Member

@saivishalvangala there is no sampling at collector level. Sampling only happens in the SDKs. The sampling strategy configuration that you can pass to the collector is used to pass to SDKs only.

I am not sure if Istio even supports Jaeger SDK - there were some attempts to link Jaeger C++ SDK, but I don't know what the current status is. Therefore, any configuration you provide to collectors will not have any effect if the traces are started by Istio.

@saivishalvangala
Copy link

Hi @yurishkuro thank you for your reply. Actually "istio" is name of micro-service which I configured, nothing related to istio-service mesh.
If there is no collector level sampling, then could you explain me what is collector-sampling-configuration i.e adaptive sampling in the below link- https://www.jaegertracing.io/docs/1.18/sampling/#collector-sampling-configuration
Here it is mentioned that, sampling.json is used to sample operation wise. So I thought to use this feature and restrict traces of some operations of micro-service.
My use case is to ignore health and prometheus checks which will ping traces for every 30 seconds and also few application operations.
Could you please help me how to implement my use case?

@jpkrohling
Copy link
Contributor

That sampling file is used by the SDKs when the "remote" sampling strategy is used. It allows admins to control centrally the strategy for all clients at once.

@albertteoh
Copy link
Contributor

albertteoh commented Oct 2, 2020 via email

@saivishalvangala
Copy link

Hi @albertteoh , Thanks for explaining in such a detailed way.
Answers for your queries:

  1. there is no Jaeger agent running as side-car on the same host as micro-service "istio".
  2. Yeah Tracer bean is configured with sampling type " remote".
  3. I am not understanding where to observe these logs, because Jaeger instance is running in a pod and that has 4 services :- agent, collector, collector-headless, query.
    image

image

I think those logs will be from jaeger-agent side car. Am I right?

@albertteoh
Copy link
Contributor

albertteoh commented Oct 6, 2020 via email

@saivishalvangala
Copy link

saivishalvangala commented Oct 6, 2020

Hi @albertteoh thank you for such valuable information.

Firstly I will tell you the way which I tried:

  1. I have deployed a Jaeger instance with the name "with-sampling". Deployed a micro-service in the namespace "example-app-new". This Micro-service emits traces if an endpoint is hit. I followed https://www.jaegertracing.io/docs/1.19/operator/#auto-injecting-jaeger-agent-sidecars and injected the jaeger-agent side-car.
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: '1'
    sidecar.jaegertracing.io/inject: with-sampling

"with-sampling" instance name of jaeger deployed in "jaeger-operator" namespace.

  1. The Endpoint of Jaeger that I configured is:
   spec:
      containers:
        - env:
            - name: JAEGER_ENDPOINT
              value: 'http://with-sampling-collector.jaeger-operator:14268/api/traces'
  1. The sampling.json file configured in collector is:
{
  "service_strategies": [
    {
      "service": "istio",
	  "type": "probabilistic",
      "param": 0.0,
      "operation_strategies": [
        {
          "operation": "/istio-arch-type/kafka/topic/post/json",
          "type": "probabilistic",
          "param": 0.0
        },
        {
          "operation": "/istio-arch-type/kafka/topic/post/xml",
          "type": "probabilistic",
          "param": 0.0
        }
      ]
    },
    {
      "service": "example-app",
	  "type": "probabilistic",
      "param": 0.8,
      "operation_strategies": [
        {
          "operation": "getHi",
          "type": "probabilistic",
          "param": 0.0
        }
		]
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.5,
    "operation_strategies": [
      {
        "operation": "/health",
        "type": "probabilistic",
        "param": 0.0
      },
      {
        "operation": "/metrics",
        "type": "probabilistic",
        "param": 0.0
      }
    ]
  }
}
  1. The sampler type given while creating Tracer bean is "remote":
 @Bean
    public io.opentracing.Tracer jaegerTracer()
    {

        final Configuration.SamplerConfiguration samplerConfig = Configuration.SamplerConfiguration.fromEnv()
                .withType("remote");
        final Configuration.ReporterConfiguration reporterConfig = Configuration.ReporterConfiguration.fromEnv()
                .withLogSpans(true);
        final Configuration config = new Configuration("example-app").withSampler(samplerConfig)
                .withReporter(reporterConfig);

        return config.getTracer();

    }

but traces are not coming up in JaegerUI.

I have few observations and doubts in my configurations:

  1. Should I use jaeger-endpoint still to push traces after sidecar injection?
  2. Is the operation-name "getHi" correct in sampling.json file? The below is the screenshot of trace which is obtained with "const" and 1 sampler configurations.
    image
    image

Could you please correct me if I gave any configurations wrongly?.
I kindly request your help to achieve:

  1. Remote sampling by getting sampling configurations from collector.
  2. Operation-wise sampling through which i can restrict few of the operations in my micro-service.

@saivishalvangala
Copy link

saivishalvangala commented Oct 6, 2020

Hi @albertteoh .
Please have a look into the code in the below link and comment on the changes needed to achieve the tasks mentioned above:
https://github.com/saivishalvangala/jaeger-sampling

@albertteoh
Copy link
Contributor

Should I use jaeger-endpoint still to push traces after sidecar injection?

You shouldn't need JAEGER_ENDPOINT if you have an agent running. But span submission should still work with JAEGER_ENDPOINT set.

Is the operation-name "getHi" correct in sampling.json file? The below is the screenshot of trace which is obtained with "const" and 1 sampler configurations.

Yes, that looks fine to me.

Short of viewing the metrics I mentioned above, maybe you can try running example-app locally and stepping through the code to see if the remote config is being successfully fetched from jaeger-agent.

@saivishalvangala
Copy link

I tried both running in local and deployment in kubernetes cluster, but no luck. Remote config is not being fetched from jaeger-agent.

@albertteoh
Copy link
Contributor

albertteoh commented Oct 7, 2020

Okay, the next questions that come to mind are:

  • Why is the remote config not being fetched from jaeger-agent?
  • Can example-app reach jaeger-agent (port 5778)?

You mentioned you tried running this stack locally. Can you:

  • curl jaeger-collector's /api/sampling endpoint during your local test runs? You should see something like this:
$ curl http://localhost:14268/api/sampling\?service\=tracegen
{"strategyType":"PROBABILISTIC","probabilisticSampling":{"samplingRate":1},"operationSampling":{"defaultSamplingProbability":1,"defaultLowerBoundTracesPerSecond":0,"perOperationStrategies":[{"operation":"lets-go","probabilisticSampling":{"samplingRate":0}},{"operation":"/health","probabilisticSampling":{"samplingRate":0}},{"operation":"/metrics","probabilisticSampling":{"samplingRate":0}}]}}
  • curl jaeger-agent's /sampling endpoint during your local test runs? You should see something like this:
$ curl http://localhost:5778/sampling\?service\=tracegen
{"strategyType":"PROBABILISTIC","probabilisticSampling":{"samplingRate":1},"operationSampling":{"defaultSamplingProbability":1,"defaultLowerBoundTracesPerSecond":0,"perOperationStrategies":[{"operation":"lets-go","probabilisticSampling":{"samplingRate":0}},{"operation":"/health","probabilisticSampling":{"samplingRate":0}},{"operation":"/metrics","probabilisticSampling":{"samplingRate":0}}],"defaultUpperBoundTracesPerSecond":0}}

@saivishalvangala
Copy link

saivishalvangala commented Oct 12, 2020

Hi @albertteoh ,
The below are the screenshots of the outputs of curl for the endpoints mentioned:
image
image
I tried different service names but getting 404 not found.

One good news is that now a jaeger-agent is injected as a sidecar into the pod of application and the application is able to fetch the default sampling strategy from the "sampling.json" file configured in the collector. Thanks for your continuous help to achieve this.
But service_strategies are not working. Only default_strategies are picked all the time. I strongly feel there is some issue with the service name that I defined in "sampling.json". I tried giving different names but no luck.

{
	"service_strategies": [
		{
			"service": "example-app.jaeger-operator",
			"type": "probabilistic",
			"param": 0,
			"operation_strategies": [{
				"operation": "getHi",
				"type": "probabilistic",
				"param": 0
			}]
		}
	]
}

I gave only service_strategies in sampling.json. As per the above "sampling.json", all endpoints of example-app-jaeger should not be traced, but all the service is picking deafault_strategy and all the traces are coming up in Jaeger-UI.

I tried below service names:

  1. example-app-jaeger :- I tried this because i configured this name while creating Tracer Bean
Configuration.SamplerConfiguration samplerConfig=Configuration.SamplerConfiguration.fromEnv().withType("remote");
final Configuration.ReporterConfiguration reporterConfig = Configuration.ReporterConfiguration.fromEnv().withLogSpans(true);
final Configuration config = new Configuration("example-app-jaeger").withSampler(samplerConfig)
                                              .withReporter(reporterConfig);
  1. example-app.jaeger-operator :- I tried this because "JAEGER_SERVICE_NAME: example-app.jaeger-operator" automatically generated after deployment with logic .
    image
  2. example-app:- I tried this because the service name in deployment.yml is "example-app".

This is the screenshot of trace in Jaeger for one endpoint of example-app-jaeger:
image

This is the controller method:

  @GetMapping("/hello")
    @TraceIt
    public ResponseEntity<String> getHeloo()
    {
        return new ResponseEntity<>("Hi", HttpStatus.OK);
    }

Do you find any issue in the trials that I did? Could you help me in enabling service_strategies and operation_strategies to respective services? Tried different operation names and service names, but no luck

Regards,
Vishal.

@albertteoh
Copy link
Contributor

A service name of "example-app-jaeger" with operation "getHi" looks correct to me.

Curious, how many calls to "getHi" are you making and how many traces are you seeing (I know you're expecting 0 traces)?

Reason for asking is, when running this locally with a simple Go app (tracegen), with 0 probability of sampling at both the service level and operation level, interestingly, I'm seeing at least 1 trace come through.

The reason for this behaviour, when stepping through code is because the Probabilistic sampling strategy has a guaranteed lower-bound sampler that guarantees that a minimum rate is sampled.

In jaeger-client-go, the per-operation sampler holds an instance of a RateLimiter that has an initial maxBalance of at least 1, even if the lowerBound is 0. After the first trace is emitted, the balance is reduced down to 0 and subsequent updates to add credit to the balance do nothing since creditsPerSecond is 0.

This behaviour is unexpected to me; but maybe my test is flawed. I wonder if anyone in the community can confirm if this is expected behaviour of the lowerbound sampler or if it's a bug? i.e. I would expect no traces at all for tracegen::lets-go given the following sampling.json config:

 {
   "service_strategies": [
     {
       "service": "tracegen",
       "type": "probabilistic",
       "param": 0.0,
       "operation_strategies": [
         {
           "operation": "lets-go",
           "type": "probabilistic",
           "param": 0.0
         }
       ]
     }
   ],
...

@saivishalvangala
Copy link

saivishalvangala commented Oct 15, 2020

Hi @albertteoh,

I made around 8 calls and I got 8 traces in Jaeger. If I observe sampler.type and sampler.param in the span of each trace, I am seeing service_strategy sampling configurations but not of operation_strategy.
image
image

I have loaded the following sampling configurations into the collector:

{
	"service_strategies": [{
		"service": "example-app-example-app",
		"type": "probabilistic",
		"param": 0.8,
		"operation_strategies": [{
			"operation": "getHi",
			"type": "probabilistic",
			"param": 0
		}]
	}]
}

I have one important observation.
If I give the operation name as "GET" then all the GET operations are sampled as per operation_strategy sampling configurations:
Here, is a proof of concept:
image
Sampling configurations loaded into collector:

{
	"service_strategies": [{
		"service": "example-app-example-app",
		"type": "probabilistic",
		"param": 0.8,
		"operation_strategies": [{
			"operation": "GET",
			"type": "probabilistic",
			"param": 0.9
		}]
	}]
}

POST operations are sampling only if the operation name is given as "POST".
But I strongly feel this is not the correct way of sampling the operations, as we may have different traffic for different GET operations and need to sample each of them differently.

This behavior is unexpected to me, but maybe my test is flawed. Anyone in the community could confirm if this behavior meets expectations or if it's a bug?
@yurishkuro @black-adder @jpkrohling @jmhon08 @albertteoh I need all your help here.

Regards,
Vishal.

@yurishkuro
Copy link
Member

I would verify that the strategy returned by the agent (curl the corresponding endpoint) for your service matches your expectation.

What you describe about GET/POST is very odd behavior. Sampling does not look at http methods, it only looks at the root span name as a proxy for the endpoint name. So unless you're putting GET as span name (which didn't look like from your example), the strategy with GET operation should never match.

@saivishalvangala
Copy link

Hi @yurishkuro,

Even I was surprised by this behavior of sampling operation-wise. But I am pretty sure that this is how operation-wise sampling is working as I tested this twice thrice before posting here. GET/POST are considered as operation names of respective HTTP methods, but not actual operation name of the trace.

You can refer to my previous comments and observations that I posted. Please correct me if my observations are wrong.

Regards,
Vishal.

@yurishkuro
Copy link
Member

@saivishalvangala what you're describing is physically impossible, Jaeger samplers do not even have access to tags on the span, they only have access to operation name. Consider providing a reliable reproducer if you want us to investigate. The example you linked (https://github.com/saivishalvangala/jaeger-sampling) does not build for me, and is using a const sampler, for which the external sampling strategies have no effect.

@saivishalvangala
Copy link

saivishalvangala commented Oct 20, 2020

Hi @yurishkuro,

In the (https://github.com/saivishalvangala/jaeger-sampling) please use deployment.yaml which has sampling configurations as remote, not deployment1.yaml to reproduce the issue which I described.
I added an executable jar in a zip folder Jar.zip and pushed it into the same repository.
I have pushed a few version changes of dependencies into the repository. Try once again to build the code and if fails still, please use that jar to build an image and proceed with deployment.yaml.

Regards,
Vishal

@yurishkuro
Copy link
Member

@saivishalvangala I don't need the deployment or a jar, I need to be able to compile and run the code from source, otherwise I cannot investigate.

@saivishalvangala
Copy link

@yurishkuro please try to compile and run the code now. It will work for you.

@yurishkuro
Copy link
Member

yurishkuro commented Oct 21, 2020

I am getting the same error:

$ mvn verify
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project example-app: Compilation failure: Compilation failure:
[ERROR] /Users/ysh/dev/saivishalvangala/jaeger-sampling/src/main/java/com/cgi/ogs/exampleapp/trace/TraceAspect2.java:[4,24] package org.aspectj.lang does not exist
[ERROR] /Users/ysh/dev/saivishalvangala/jaeger-sampling/src/main/java/com/cgi/ogs/exampleapp/trace/TraceAspect2.java:[5,35] package org.aspectj.lang.annotation does not exist
[ERROR] /Users/ysh/dev/saivishalvangala/jaeger-sampling/src/main/java/com/cgi/ogs/exampleapp/trace/TraceAspect2.java:[6,35] package org.aspectj.lang.annotation does not exist
[ERROR] /Users/ysh/dev/saivishalvangala/jaeger-sampling/src/main/java/com/cgi/ogs/exampleapp/trace/TraceAspect2.java:[23,2] cannot find symbol
[ERROR]   symbol: class Aspect
[ERROR] /Users/ysh/dev/saivishalvangala/jaeger-sampling/src/main/java/com/cgi/ogs/exampleapp/trace/TraceAspect2.java:[56,32] cannot find symbol
[ERROR]   symbol:   class ProceedingJoinPoint
[ERROR]   location: class com.cgi.ogs.exampleapp.trace.TraceAspect2
[ERROR] /Users/ysh/dev/saivishalvangala/jaeger-sampling/src/main/java/com/cgi/ogs/exampleapp/trace/TraceAspect2.java:[55,6] cannot find symbol
[ERROR]   symbol:   class Around
[ERROR]   location: class com.cgi.ogs.exampleapp.trace.TraceAspect2

@rjrakesh
Copy link

Yes, we can do that with the help of tracing options,add the ignorepatterns in your trace.json and using aspnetcorediagnostics options filter these trace
{ "TracingOptions": { "Host": "http://localhost", "Port": 14268, "Debug": "True", "IgnorePatterns": [ "/hc", "/liveness" ] } }

@gfernandezcuri
Copy link

Hi, i try to use the sampling configuration base on jaeger documentation, but i still get sapn metrics on jaeger https://www.jaegertracing.io/docs/1.38/sampling/#CollectorSamplingConfiguration
Anyone found the solution? pls help

@ie-pham
Copy link

ie-pham commented Jan 10, 2023

I have the same question/issue as @albertteoh . For operations that we've set to

param: 0

we are still seeing traces being sampled. Is there a way to completely ignore/not sample spans of specific operations?
Screenshot 2023-01-10 at 2 56 26 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants