Failure Operations Sent To Application Insights #33741

mike-england · 2024-01-08T19:09:29Z

Package Name: azure-monitor-opentelemetry
Package Version: 1.1.1
Operating System: MacOS, Debian (bookworm)
Python Version: 3.12.1

Describe the bug
I've been able to instrument my flask application and after some struggling to get the application map to show up correctly, I think I've got everything reporting correctly and to my liking. That said, I do get a "Failed Operation" showing up in application insights. it looks like one every 3-5 seconds. The really bad part is I'm not able to look at any additional information as they don't have a response code.

I thought it might be related to this issue: #33623 as I was seeing these status messages. I silenced those with the following but, of course, any data they're actually sending would still be showing up an application insights

azure_api_logging = logging.getLogger('azure') azure_api_logging.setLevel(logging.ERROR)

I've attached a screen shot of the strange operations. Has anyone seen something like this before?

To Reproduce
I don't have a clean, simple example. I'm currently running flask 3.0 but also saw this behaviour with flask 2.1.2.

Expected behavior
No operation errors reported

Screenshots

The text was updated successfully, but these errors were encountered:

github-actions · 2024-01-08T23:59:02Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azmonapplicationinsights.

lzchen · 2024-01-09T17:39:34Z

@mike-england

Could you upload a code snippet of how you are instrumenting with the SDK as well as any logging/flask calls you are making.

mike-england · 2024-01-10T19:18:06Z

Yes, I think I've been able to narrow this down to the minimum to reproduce the problem:

app.py

import logging

from flask import Flask
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from waitress import serve

from azure_telemetry import setup_logging

FORMAT = '%(asctime)s - {%(module)s:%(lineno)d} %(levelname)s - %(message)s'
myHandler = logging.StreamHandler()

logging.basicConfig(format=FORMAT, level=logging.INFO, handlers=[myHandler], datefmt="%Y-%m-%d %H:%M:%S %Z")
logger = logging.getLogger(__name__)

logging.getLogger('azure').setLevel(logging.ERROR)
app = Flask(__name__)

@app.route('/health')
def health():
    return {"status":"all good"}

if __name__ == "__main__":
    instrumentor = FlaskInstrumentor()
    instrumentor.instrument_app(app)
    
    setup_logging()

    serve(app, port=8080, threads=4)

azure_telemetry.py

from os import environ

from azure.monitor.opentelemetry import configure_azure_monitor

azure_connection_string = environ.get('AZURE_CONNECTION', 'InstrumentationKey=cd7428fa-655b-4ace-bdd8-f4e146f74fae;IngestionEndpoint=https://westus2-2.in.applicationinsights.azure.com/')
azure_cloud_role = environ.get('AZURE_CLOUD_ROLE', 'Azure-Test')
# this works, but I'd rather set programatically
environ['OTEL_SERVICE_NAME'] = azure_cloud_role

def setup_logging():
    configure_azure_monitor(connection_string=azure_connection_string)

requirements.txt

Flask==3.0.0
waitress==2.1.2
azure-monitor-opentelemetry==1.1.1

Dockerfile

# For more information, please refer to https://aka.ms/vscode-docker-python
FROM python:3.12.1-slim-bookworm

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE 1

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED 1

RUN apt-get update
RUN apt-get install -y libsasl2-dev python-dev-is-python3 libldap2-dev libssl-dev gcc librdkafka-dev

# Install pip requirements
ADD requirements.txt .
RUN python -m pip install -r requirements.txt

WORKDIR /app
ADD . /app

# Switching to a non-root user, please refer to https://aka.ms/vscode-docker-python-user-rights
RUN useradd appuser && chown -R appuser /app
USER appuser

# see https://github.com/Azure/azure-sdk-for-python/issues/33295
ENV OTEL_EXPERIMENTAL_RESOURCE_DETECTORS "azure_app_service"
ENV OTEL_PYTHON_EXCLUDED_URLS="health,swagger.*"

# During debugging, this entry point will be overridden. For more information, please refer to https://aka.ms/vscode-docker-python-debug
CMD ["python", "app.py"]

I copied the docker file from another project so there are a couple extra packages in there but I don't think they have anything to do with this.

build / run: docker build -t azuretelemetry . && docker run --rm -p 8080:8080 azuretelemetry

And then hit the /health endpoint http://localhost:8080/health a bunch of times

When we run this under kubernetes, the /health endpoint gets hit a lot which is what I think I'm seeing in the operations error page.

lzchen · 2024-01-17T20:55:40Z

@mike-england

configure_azure_monitor() already calls FlaskInstrumentor().instrument() underneath the hood. Could you remove the calls

instrumentor = FlaskInstrumentor()
    instrumentor.instrument_app(app)
```
and just call `setup_logging()` from your app and see if you still are getting this issue?

mike-england · 2024-01-18T17:51:07Z

Technically the problem goes away, but so does all the data. I don't get any failed requests but I'm also missing server response times, and nothing shows up in the application map. I do continue to get logs under traces.

If I put that line in, all of the data is there plus those errors. Did you get different results from the example code? Maybe I've got something setup incorrectly in my azure application insights.

lzchen · 2024-01-18T19:00:03Z

@mike-england

We have a basic example for flask here, could you try this and then build your application on top of it to see if you get telemetry?

mike-england · 2024-01-18T22:53:40Z

Do you guys happen to have a 'complete' example? I can't run that directly which has been a source of confusion.

lzchen · 2024-01-19T19:21:28Z

May I ask why you aren't able to run it directly? Are you running into errors? The idea is to have a basic setup using flask and azure monitor distro and then build your solution on top of it, checking that telemetry is being sent along the way.

mike-england · 2024-01-20T05:47:42Z

It'd be nice if there were some notes in the code. There are errors. If I just run it I get this

raise ValueError("Instrumentation key cannot be none or empty.")

And of course I need to set APPLICATIONINSIGHTS_CONNECTION_STRING.

And once that's fixed it hangs with an error Exception in detector <opentelemetry.resource.detector.azure.vm.AzureVMResourceDetector object at 0x104727580>, ignoring

I can fix that by setting OTEL_EXPERIMENTAL_RESOURCE_DETECTORS="azure_app_service" and then the demo runs.

I will also say that I think I've been able to fix my problem, at least with the test code I have above. It's all about ordering, so it'd be nice if the team was able to do something about that.

Yes, I need to run configure_azure_monitor() before flask is initialized, but after my logging has been established.

That results in app.py looking like this:

import logging
from azure.monitor.opentelemetry import configure_azure_monitor

FORMAT = '%(asctime)s - {%(module)s:%(lineno)d} %(levelname)s - %(message)s'
myHandler = logging.StreamHandler()

logging.basicConfig(format=FORMAT, level=logging.INFO, handlers=[myHandler], datefmt="%Y-%m-%d %H:%M:%S %Z")
logger = logging.getLogger(__name__)

logging.getLogger('azure').setLevel(logging.ERROR)

configure_azure_monitor()

from flask import Flask, redirect,url_for
from waitress import serve
from blueprint import blueprint as blueprintv1

app = Flask(__name__)

@app.route('/health')
def health():
    return {"status":"all good"}

@app.route('/')
def hello():
    return redirect(url_for('.apiv1.doc'))

if __name__ == "__main__":
    app.register_blueprint(blueprintv1, url_prefix="/api/v1.0")
    serve(app, port=8080, threads=4)

Not as intuitive as I'd like but so far, seems to be working.

mike-england · 2024-01-20T22:40:58Z

And with some further testing, just to be clear to anyone running into this thread. You need to import flask after you've initialized azure monitor, not just initializing flask. You also need to setup any local logging you'd like before or you will get azure logs but you won't get anything local.

This means I can technically move the from waitress and from blueprint statements to the top of the file but the rest need to stay. Again, in my opinion, not ideal. I'd rather need a separate flask initialization and have the flow of my code be as I like but at least this is working now without any phantom failed requests

mike-england · 2024-01-21T03:57:22Z

Sorry, one more thing. I think the requirements are also tripping me up. Things were working find on my laptop but when I put it into a docker container the application map and server requests stopped working. At first I thought this was flask 3.0, however I think it's actually Werkzeug. If I use this as my requirements.txt file:

Flask==2.3.3
azure-monitor-opentelemetry==1.1.1
Werkzeug==2.3.8

It works fine, but if I use this:

Flask==2.3.3
azure-monitor-opentelemetry==1.1.1
Werkzeug==3.0.1

It no longer works. I'm trying to run flask 3.0+ which requires Werkzeug 3.0+.

lzchen · 2024-01-22T19:15:33Z

@mike-england

Alot to unpack here. I'll attempt to address your points individually

raise ValueError("Instrumentation key cannot be none or empty.")

We have numerous amount of documentation related to having to set your connection string in our official docs and github. The concept of connection string should be understood as an Applicaiton Insights concept before using the SDK.

Exception in detector <opentelemetry.resource.detector.azure.vm.AzureVMResourceDetector object at 0x104727580>, ignoring

This is a recent bug that was introduced that SHOULD be mitigated with the latest release. Please try to upgrade your SDK to the latest version to see if you are still getting errors (without setting the env var). If you are still getting the hanging, you can set the env var for now to unblock you but we do want to try to debug that use case without it. Should not be indicative of regular usage.

You also need to setup any local logging you'd like before or you will get azure logs but you won't get anything local.

This shouldn't be the case. If you are getting this behavior then this is not the intention of the SDK to change your configured behavior outside of monitoring. Try running this example to see if you still witness this related to logging.

You need to import flask after you've initialized azure monitor, not just initializing flask.

I believe you can do import flask before running configure_azure_monitor() but not from flask Import Flask. Our example just needs updating most likely.

At first I thought this was flask 3.0, however I think it's actually Werkzeug.

Flask 3.0+ is currently not supported in the OpenTelemetry Flask instrumentation

mike-england · 2024-01-23T05:36:09Z

Thanks for your patience. I think my first two comments were really about things I expected. I have read as much documentation as I can find but found things a bit confusing. Having a couple lines of comment in that demo code could go a long way to letting new people get up and running, but that's just my 2 cents.

For initialization order, I'll play more with this, but for sure I have something that works. So, while it has been a bit frustrating, I'm through that.

As for flask 3, I was pretty sure I read it somewhere, but I guess not. I have seen this issue, so hopefully something will be in a release soon and that would solve my problem:

open-telemetry/opentelemetry-python-contrib#2013

xiangyan99 added Service Attention Workflow: This issue is responsible by Azure service team. and removed needs-team-triage Workflow: This issue needs the team to triage. labels Jan 8, 2024

xiangyan99 assigned lzchen Jan 8, 2024

github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Jan 8, 2024

mike-england closed this as completed Jan 24, 2024

mike-england mentioned this issue Mar 5, 2024

Azure Failures #34616

Open

cecheta mentioned this issue Apr 5, 2024

Correctly configure app insights for the web app Azure-Samples/chat-with-your-data-solution-accelerator#592

Merged

github-actions bot locked and limited conversation to collaborators Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure Operations Sent To Application Insights #33741

Failure Operations Sent To Application Insights #33741

mike-england commented Jan 8, 2024

github-actions bot commented Jan 8, 2024

lzchen commented Jan 9, 2024

mike-england commented Jan 10, 2024

lzchen commented Jan 17, 2024

mike-england commented Jan 18, 2024

lzchen commented Jan 18, 2024 •

edited

Loading

mike-england commented Jan 18, 2024

lzchen commented Jan 19, 2024 •

edited

Loading

mike-england commented Jan 20, 2024 •

edited

Loading

mike-england commented Jan 20, 2024

mike-england commented Jan 21, 2024 •

edited

Loading

lzchen commented Jan 22, 2024

mike-england commented Jan 23, 2024

Failure Operations Sent To Application Insights #33741

Failure Operations Sent To Application Insights #33741

Comments

mike-england commented Jan 8, 2024

github-actions bot commented Jan 8, 2024

lzchen commented Jan 9, 2024

mike-england commented Jan 10, 2024

lzchen commented Jan 17, 2024

mike-england commented Jan 18, 2024

lzchen commented Jan 18, 2024 • edited Loading

mike-england commented Jan 18, 2024

lzchen commented Jan 19, 2024 • edited Loading

mike-england commented Jan 20, 2024 • edited Loading

mike-england commented Jan 20, 2024

mike-england commented Jan 21, 2024 • edited Loading

lzchen commented Jan 22, 2024

mike-england commented Jan 23, 2024

lzchen commented Jan 18, 2024 •

edited

Loading

lzchen commented Jan 19, 2024 •

edited

Loading

mike-england commented Jan 20, 2024 •

edited

Loading

mike-england commented Jan 21, 2024 •

edited

Loading