-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't avoid duplicate metrics in short lived cloud functions e.g. GCP Cloud Function #35522
Comments
@psx95, I know you looked into this recently. Can you respond? Also @AkselAllas can you share more about your setup? Is your application sending to a collector? Or directly to google cloud? |
@AkselAllas can you share your collector config? Since cloud monitoring can only accept points every 5 seconds, you will need to aggregate over time to avoid errors. Something like https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/intervalprocessor should be what you need, but it is listed as being under development right now |
@dashpole How would the linked processor work with 10 sec batch? I have e.g.
|
I haven't tried it, but since it is aggregating metrics over time, I would expect it to replace the batch processor in your setup. |
@AkselAllas, force flushing metrics in quick succession at such a high rate would not be helpful as the granularity of metrics for Cloud Monitoring can only be at the very minimum a 10 second interval. You are likely to run into errors if you're exporting more frequently than every 10 seconds. The core of your problem as you have correctly identified is the detachment of CPU once your function completes that prevents background export. gcloud beta run deploy cloud-func-helloworld2 \
--no-cpu-throttling \ # this option triggers always-allocated cpu
--container app-function \
--function org.example.HelloWorld \
--build-env-vars-file=config/build-env-vars.yaml \
--source=build/libs \
--port=8080 \
--container otel-collector \
--image=us-central1-docker.pkg.dev/your-gcp-project/your-artifact-registry/otel-collector:latest Alternatively, once you have deployed a Cloud Function, from GCP console, you could manually: I have personally tried this with the following function code: package org.example;
import com.google.cloud.functions.HttpFunction;
import com.google.cloud.functions.HttpRequest;
import com.google.cloud.functions.HttpResponse;
import io.opentelemetry.api.metrics.LongCounter;
import java.util.Random;
public class HelloWorld implements HttpFunction {
private static final OpenTelemetryConfig openTelemetryConfig = OpenTelemetryConfig.getInstance();
private static final LongCounter counter =
openTelemetryConfig
.getMeterProvider()
.get("sample-function-library")
.counterBuilder("function_counter_psx")
.setDescription("random counter")
.build();
private static final Random random = new Random();
public HelloWorld() {
super();
Runtime.getRuntime()
.addShutdownHook(
new Thread(
() -> {
System.out.println("Closing OTel SDK");
openTelemetryConfig.closeSdk();
System.out.println("Sdk closed");
}));
}
@Override
public void service(HttpRequest request, HttpResponse response) throws Exception {
System.out.println("received request: " + request.toString());
counter.add(random.nextInt(100));
response.getWriter().write("Hello, World\n");
System.out.println("Function exited");
}
} I was able to export metrics like this to a collector running in a sidecar from a cloud function. The PeriodicMetricReaders were configured with export interval of 10 seconds. With always-on CPU, I was also able to verify that shutdown hook was called, and the close() method flushes any pending metrics that you may have remaining (at least in Java implementation). I have tried this only with Java, but I imagine it would work in NodeJS as well. |
@psx95 Thank you for the in-depth response! Our problem currently is that we have tens of v1 Cloud Functions and they don't support shutdown hooks. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Feel free to reopen if you have further questions |
Component(s)
exporter/googlecloud
Describe the issue you're reporting
I am calling forceFlush multiple times fast (e.g 2x in 0.5 sec) because GCP Cloud Functions run once and detach CPU, so oftentimes a periodic metric exporter will either fail to export, because CPU/network has detached before metric export or it will cause error spam due to trying export after CPU/network has detached.
Is it possible to call metric forceFlush in e.g. nodejs
metricReader.forceFlush()
multiple times and somehow not end up with duplicate metrics errors in otel collector?E.g. can I somehow use otel collector processor to remove duplicates before export? My main problem is that duplicate errors are creating noise in otelcol_exporter_send_failed_metric_points_total metric, which I am using to detect lost metrics.
The text was updated successfully, but these errors were encountered: