[Azure Monitor] Expected errors are reported in Application Insights for Blob storage operations #9908

sebader · 2020-02-11T20:13:42Z

Describe the bug
I'm using the Azure.Messaging.EventHubs.Processor (5.0.1) with an EH with 32 partitions. Every partition gets checkpointed every 10 seconds (if there was new data arriving). Now I started to notice in Application Insights, that some of the checkpointing calls to Blob storage fail with a 412 error code.

Azure.RequestFailedException: The condition specified using HTTP conditional header(s) is not met.
RequestId:2130b973-701e-0130-5c14-e1c499000000
Time:2020-02-11T19:47:37.1509796Z
Status: 412 (The condition specified using HTTP conditional header(s) is not met.)

ErrorCode: ConditionNotMet

Headers:
Server: Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id: 2130b973-701e-0130-5c14-e1c499000000
x-ms-client-request-id: f01bd21c-fda0-429a-b88f-2eb41a153efb
x-ms-version: 2019-02-02
x-ms-error-code: ConditionNotMet
Date: Tue, 11 Feb 2020 19:47:36 GMT
Content-Length: 252
Content-Type: application/xml

   at Azure.Storage.Blobs.BlobRestClient.Blob.SetMetadataAsync_CreateResponse(Response response)
   at Azure.Storage.Blobs.BlobRestClient.Blob.SetMetadataAsync(ClientDiagnostics clientDiagnostics, HttpPipeline pipeline, Uri resourceUri, Nullable`1 timeout, IDictionary`2 metadata, String leaseId, String encryptionKey, String encryptionKeySha256, Nullable`1 encryptionAlgorithm, Nullable`1 ifModifiedSince, Nullable`1 ifUnmodifiedSince, Nullable`1 ifMatch, Nullable`1 ifNoneMatch, String requestId, Boolean async, String operationName, CancellationToken cancellationToken)

I can also see the errors as "Client Error" in my Blob storage metrics. Most of the calls seem to work fine, but some create the error. Looks something inside the SDK to me, not directly related to my code.

Expected behavior
Should run without errors.

To Reproduce
Hard to tell. The errors also come when I have almost zero load on the EH (only a handful messages).
Happy to jump on a screen share if that helps.

Environment:

Name and version of the Library package used: Azure.Messaging.EventHubs.Processor (5.0.1)
Hosting platform or OS and .NET runtime version : dotnet 3.1 in Linux container running on AKS.

The text was updated successfully, but these errors were encountered:

tg-msft · 2020-02-11T21:32:53Z

//fyi @jsquire

jsquire · 2020-02-11T21:35:25Z

@kinelski: Can you take a look and see if this is because of the conditional access that we're making with ownership requests? I'm trying to determine if these are expected and either Event Hubs or Storage is surfacing something that it shouldn't as an error.

kinelski · 2020-02-12T01:33:12Z

@sebader Thank you for reporting this issue.

Could you tell us how many Event Processor Client instances are being used in your scenario? This might help us understand the nature of the problem.

sebader · 2020-02-12T05:59:14Z

________________________________ It’s auto scaling between 4 and 32 instances on AKS depending on the load.

jsquire · 2020-02-12T22:19:39Z

@sebader: Apologies for the difficulties and thank you for bringing this to our attention. While we certainly agree that having these errors appear is confusing and not the experience that we want to offer, it is, unfortunately, by-design in the current implementation.

For context, the diagnostics emitting these errors are coming from the Storage client that Event Hubs uses, which is based entirely off of the response. Because it is a 4xx series, it is automatically interpreted by the diagnostics framework as an error.

The Event Processor makes a conditional request when trying to claim ownership of a partition for processing. Because processor instances compete for ownership, it is expected that many of these requests do not succeed due to another instance having already claimed it. Within Event Hubs, this is treated as a normal code path. However, at that point, the Storage client has already registered the failure with its diagnostics.

I've opened #9934 as a feature request for exposing the ability to treat service responses that are expected and normal for the consuming application as non-failures.

lmolkova · 2020-02-13T00:01:24Z

I've left comment here #9934 (comment) and basically this is the Azure Monitor (Application Insights) approach to mark 4xx as failures.

If we attempt to change this from Azure SDK side, it will become inconsistent with the rest of Azure Monitor logic handling 4xx status codes for incoming and outgoing requests .

One approach we provide is to do some custom logic in the code to mark suck failures as non-failures
https://stackoverflow.com/questions/37533431/how-to-tell-application-insights-to-ignore-404-responses
https://docs.microsoft.com/en-us/azure/azure-monitor/app/api-filtering-sampling

This is a bit involved, but allows to customize almost everything.

From Azure Monitor side, I believe we should do better job helping you isolate such calls and tell they are noise. App Map for example has a filter to remove 4xx failures and I think we should do more.

@sebader can you please help me understand an issue a bit better?

are you expecting to see only calls made directly by your code? I.e. is what happens under checkpointer a concern for you and would you want to know about underlying storage operations?
what are the problems these failures introduce from service monitoring perspective? do they hide anything? are you able to separate them from real issues? Are they just noise/confusion?

sebader · 2020-02-13T08:55:01Z

@jsquire thanks for looking into it and the thorough the explanation!

@lmolkova First of all I didn't have any idea where the error came from. I just saw it popping up in my monitoring. From a user perspective, of course you get concerned if there are unexplained errors. A user does not know that those in this case represent "works-as-expected". If I as a user see errors, which seem to be related to checkpointing, I get concerned that my checkpoints might not be properly written and thus I will run into issues. So I would say, no, they are not just noise. Without making it clear to the user (and I don't really know what that could look like here), it raises concern.

I also would expect to see errors of underlying SDKs in my monitoring - if they do represent actual errors that I as the app owner need to take care of. When that's not the case, I would expect the SDK to hide them, or at least clearly mark them as noise in the monitoring without me as the user looking through github, docs, etc. to find out whats going on and then manually build a filter.
And yes, from the issue that @jsquire created, I understand that this might not be so easy to do.

Does this make sense from a user perspective?

jsquire · 2020-03-17T21:38:05Z

@lmolkova: I'm not quite sure what next steps would be here; there does not seem to be action that the Event Hubs client library can directly take to influence the behavior, and there are legitimate considerations raised to the proposal in #9934.

Should we open an issue somewhere for consideration or is this something that is considered by-design and that we aren't able to influence?

ghost · 2020-04-02T14:36:35Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azmonapplicationinsights.

ghost · 2020-04-30T02:00:32Z

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

ghost · 2020-04-30T08:00:20Z

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

sebader · 2020-05-03T09:33:40Z

What is the expected feedback here from me? Was there any new change done?

jsquire · 2020-05-04T14:44:20Z

There shouldn't be anything needed from you at this point, @sebader. Actions are needed from the Azure Monitor team. The bot was reacting to tags, but I don't believe those tags were accurate.

pakrym · 2021-10-04T21:13:48Z

Covered by #9934 & #19982

triage-new-issues bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 11, 2020

tg-msft added Client This issue points to a problem in the data-plane of the library. Event Hubs and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Feb 11, 2020

jsquire assigned kinelski Feb 11, 2020

jsquire mentioned this issue Feb 12, 2020

[Feature Request] Consider allowing the ability to opt-out of distributed tracing for expected HTTP error responses #9934

Closed

AlexGhiondea changed the title ~~[BUG] Event Hub Processor Host - creating errors when checkpointing to blob~~ Event Hub Processor Host - creating errors when checkpointing to blob Mar 3, 2020

AlexGhiondea added the customer-reported Issues that are reported by GitHub users external to the Azure organization. label Mar 3, 2020

jsquire changed the title ~~Event Hub Processor Host - creating errors when checkpointing to blob~~ Event Hub Processor - creating errors when checkpointing to blob Mar 17, 2020

jsquire unassigned kinelski Mar 17, 2020

jsquire changed the title ~~Event Hub Processor - creating errors when checkpointing to blob~~ [Event Hub Processor] Expected errors are logged for Blob storage operations Apr 2, 2020

jsquire changed the title ~~[Event Hub Processor] Expected errors are logged for Blob storage operations~~ [Event Hub Processor] Expected errors are reported in Application Insightsfor Blob storage operations Apr 2, 2020

jsquire changed the title ~~[Event Hub Processor] Expected errors are reported in Application Insightsfor Blob storage operations~~ [Event Hub Processor] Expected errors are reported in Application Insights for Blob storage operations Apr 2, 2020

jsquire added customer-response-expected Service This issue points to a problem in the service. Service Attention Workflow: This issue is responsible by Azure service team. and removed Event Hubs Client This issue points to a problem in the data-plane of the library. labels Apr 2, 2020

AlexGhiondea added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed customer-response-expected labels Apr 20, 2020

AlexGhiondea added the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label Apr 22, 2020

ghost added the no-recent-activity There has been no recent activity on this issue. label Apr 30, 2020

jsquire changed the title ~~[Event Hub Processor] Expected errors are reported in Application Insights for Blob storage operations~~ [Azure Monitor] Expected errors are reported in Application Insights for Blob storage operations May 4, 2020

jsquire mentioned this issue Mar 31, 2021

[BUG] Erroneous warning logs when performing CreateIfNotExistsAsync #19982

Closed

pakrym closed this as completed Oct 4, 2021

amnguye mentioned this issue May 10, 2022

[BUG] BlobClient.Exists() throws exception when blob is missing. #17129

Closed

lmolkova mentioned this issue Jun 6, 2022

[FEATURE REQ] Storage extensions should leverage RequestContext.ErrorOptions #29132

Closed

github-actions bot locked and limited conversation to collaborators Mar 28, 2023

scottaddie added Monitor Monitor, Monitor Ingestion, Monitor Query and removed Monitor - ApplicationInsights labels Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Azure Monitor] Expected errors are reported in Application Insights for Blob storage operations #9908

[Azure Monitor] Expected errors are reported in Application Insights for Blob storage operations #9908

sebader commented Feb 11, 2020

tg-msft commented Feb 11, 2020

jsquire commented Feb 11, 2020

kinelski commented Feb 12, 2020

sebader commented Feb 12, 2020 via email

jsquire commented Feb 12, 2020 •

edited

Loading

lmolkova commented Feb 13, 2020

sebader commented Feb 13, 2020 •

edited

Loading

jsquire commented Mar 17, 2020

ghost commented Apr 2, 2020

ghost commented Apr 30, 2020

ghost commented Apr 30, 2020

sebader commented May 3, 2020

jsquire commented May 4, 2020

pakrym commented Oct 4, 2021 •

edited

Loading

[Azure Monitor] Expected errors are reported in Application Insights for Blob storage operations #9908

[Azure Monitor] Expected errors are reported in Application Insights for Blob storage operations #9908

Comments

sebader commented Feb 11, 2020

tg-msft commented Feb 11, 2020

jsquire commented Feb 11, 2020

kinelski commented Feb 12, 2020

sebader commented Feb 12, 2020 via email

jsquire commented Feb 12, 2020 • edited Loading

lmolkova commented Feb 13, 2020

sebader commented Feb 13, 2020 • edited Loading

jsquire commented Mar 17, 2020

ghost commented Apr 2, 2020

ghost commented Apr 30, 2020

ghost commented Apr 30, 2020

sebader commented May 3, 2020

jsquire commented May 4, 2020

pakrym commented Oct 4, 2021 • edited Loading

jsquire commented Feb 12, 2020 •

edited

Loading

sebader commented Feb 13, 2020 •

edited

Loading

pakrym commented Oct 4, 2021 •

edited

Loading