Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Azure.Monitor.OpenTelemetry.AspNetCore 1.0.0-beta.6 does not work #38687

Closed
lmolkova opened this issue Sep 13, 2023 · 14 comments
Closed

[BUG] Azure.Monitor.OpenTelemetry.AspNetCore 1.0.0-beta.6 does not work #38687

lmolkova opened this issue Sep 13, 2023 · 14 comments
Labels
Client This issue points to a problem in the data-plane of the library. Monitor - Exporter Monitor OpenTelemetry Exporter Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@lmolkova
Copy link
Member

Library name and version

Azure.Monitor.OpenTelemetry.AspNetCore 1.0.0-beta.6

Describe the bug

When using Azure.Monitor.OpenTelemetry.AspNetCore 1.0.0-beta.6 in .NET 7 minimal API ASP.NET Core app running on localhost, my application does not start within ~30 sec.

It works perfectly fine with Azure.Monitor.OpenTelemetry.AspNetCore 1.0.0-beta.5.

If I debug the application, I can see a bunch of HttpRequestException potentially trying to resolve resources

image

Expected behavior

Application starts and reports telemetry

Actual behavior

Application does not start

Reproduction Steps

app:

using Azure.Monitor.OpenTelemetry.AspNetCore;
var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .UseAzureMonitor(o => o.ConnectionString = "...");

var app = builder.Build();
app.MapGet("/ping", async () =>
{
    await new HttpClient().GetStringAsync("https://microsoft.com");
    return new { message = "hello"};
});

app.Run();
<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    <TargetFramework>net7.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Azure.Monitor.OpenTelemetry.AspNetCore" Version="1.0.0-beta.6" />
  </ItemGroup>

</Project>

Environment

No response

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. Monitor - Exporter Monitor OpenTelemetry Exporter needs-team-triage Workflow: This issue needs the team to triage. labels Sep 13, 2023
@jsquire jsquire added Service Attention Workflow: This issue is responsible by Azure service team. and removed needs-team-triage Workflow: This issue needs the team to triage. labels Sep 14, 2023
@github-actions
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @cijothomas @reyang @rajkumar-rangaraj @TimothyMothra @vishweshbankwar.

@cijothomas
Copy link
Contributor

Is the app not starting at all (i.e crashes with exception), or it starts and works fine, but takes much longer time?

@lmolkova
Copy link
Member Author

it does not crash, I assume it'll eventually start after it exhausted all retries and timeouts trying to get IMDS from VM, but I did not wait long enough.

Either way, it seems VM resource detector is blocking and needs way smaller timeout and no retries.

@cijothomas
Copy link
Contributor

Since Resource must be added before building the Providers, this is going to be an issue. Any Detector taking time will add to the startup time.... (Azure/AWS all should have same challenges..)
Don't think there is any retry logic, so it must be purely the timeout waiting for the response.

Some possible (not perfect) mitigations:

It might be possible for the distro to detect that its is running in appservice (env var check), and then do not add VM detectors.

Or modify the AzureVM detector itself to quickly bail out, similar to what legacy sdks do.

@vishweshbankwar
Copy link
Contributor

open-telemetry/opentelemetry-dotnet-contrib#1307
This change is not released yet. This should help but we are still looking for ways to further reduce the start up time.

@lmolkova
Copy link
Member Author

it seems there is no timeout set on that client https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/src/OpenTelemetry.ResourceDetectors.Azure/AzureVmMetaDataRequestor.cs#L31 and the default value is 100 sec!

the IMDS endpoint is local, so it's really reliable. 1 sec would probably be enough.

It's something that would block onboarding and work in dev environment and has to be fixed.

@TimothyMothra
Copy link
Contributor

open-telemetry/opentelemetry-dotnet-contrib#1307 This change is not released yet. This should help but we are still looking for ways to further reduce the start up time.

@vishweshbankwar That PR is from a month ago. Do you know when the next release will be?

@TimothyMothra
Copy link
Contributor

it seems there is no timeout set on that client

Thanks for investigating @lmolkova, would you be able to submit a PR for the quick fix?

@lmolkova
Copy link
Member Author

Not this week (and I'm OOf on the next one). The fix is short and simple, but the key part here is testing it - I wonder why it was not detected via automation.

@vishweshbankwar
Copy link
Contributor

it seems there is no timeout set on that client

Thanks for investigating @lmolkova, would you be able to submit a PR for the quick fix?

@TimothyMothra We can get the fix done and request for release next week.

@vishweshbankwar
Copy link
Contributor

@lmolkova - Distro should still work with delayed start up time, do you see different behavior?

I wonder why it was not detected via automation

Are you referring to tests?

@lmolkova
Copy link
Member Author

lmolkova commented Sep 14, 2023

@lmolkova - Distro should still work with delayed start up time, do you see different behavior?

I was not patient enough to wait and I assume it's unreasonable to expect production applications (not running on app serivce of AzureVM) start time to take 100 sec.

Are you referring to tests?

Yep, do we have integration tests for distro? If so, why we didn't detect it?

@TimothyMothra
Copy link
Contributor

Yep, do we have integration tests for distro? If so, why we didn't detect it?

The integration tests run in an Azure VM and don't have the delay.

@TimothyMothra
Copy link
Contributor

I believe this issue can be closed. The root cause was related to Timeouts for Azure VM Metadata service. This has been fixed in both the Exporter and ResourceDetector.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Monitor - Exporter Monitor OpenTelemetry Exporter Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

5 participants