Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener for events from SqlClient for .Net Framework #704

Merged
merged 11 commits into from
Feb 24, 2020
Merged

Listener for events from SqlClient for .Net Framework #704

merged 11 commits into from
Feb 24, 2020

Conversation

vhatsura
Copy link
Contributor

@vhatsura vhatsura commented Jan 26, 2020

solves #700

documentation will be added in scope of #702

What needs to be done:

  • try to enable instrumentation engine and check SQL command text (by default, it isn't collected)
  • fill destination object of span (after merging Implement context.destination.address/port for DB spans #665)
  • check other provider types. Can they spread events via EventSource?
    System.Data.SQLite and Microsoft.Data.Sqlite don't spread events via EventSource, however, does it means that other providers also don't do it? Can we set Subtype property of span to mssql?
  • handle sqlExceptionNumber

Unfortunately, it's not possible to capture command text with an event listener. It's possible to do it with instrumentation engine, however, it seems in such case command text is captured from another place (I've checked command text in the payload with enabled instrumentation engine and hosting in IIS, no luck):

In the above cases, the correct way of validating that instrumentation engine is correctly installed is by validating that the SDK version of collected DependencyTelemetry is 'rddp'. 'rdddsd' or 'rddf' indicates dependencies are collected via DiagnosticSource or EventSource callbacks, and hence full SQL query won't be captured.

Capture span representation:

{
  "_index": "apm-7.5.2-span-000001",
  "_type": "_doc",
  "_id": "n25DJnABZF4_3hDy4m6U",
  "_version": 1,
  "_score": null,
  "_source": {
    "parent": {
      "id": "128e06672de0f0bb"
    },
    "observer": {
      "hostname": "ad6e0ae55174",
      "id": "a6272462-6b0f-4b04-9081-08f05f8037dc",
      "type": "apm-server",
      "ephemeral_id": "4ea50d25-d305-4afb-a034-3a903da8bbd4",
      "version": "7.5.2",
      "version_major": 7
    },
    "agent": {
      "name": "dotnet",
      "version": "1.2"
    },
    "trace": {
      "id": "4ed85f88e65ae40377fc1161cb094ebe"
    },
    "@timestamp": "2020-02-08T19:25:38.752Z",
    "ecs": {
      "version": "1.1.0"
    },
    "service": {
      "name": "WebApplication1"
    },
    "processor": {
      "name": "transaction",
      "event": "span"
    },
    "transaction": {
      "id": "128e06672de0f0bb"
    },
    "timestamp": {
      "us": 1581189938752426
    },
    "span": {
      "duration": {
        "us": 92956
      },
      "name": "model",
      "id": "a5d61d8d6c16b1e5",
      "type": "db",
      "db": {
        "instance": "model",
        "statement": "",
        "type": "sql"
      }
    }
  },
  "fields": {
    "@timestamp": [
      "2020-02-08T19:25:38.752Z"
    ]
  },
  "highlight": {
    "span.id": [
      "@kibana-highlighted-field@a5d61d8d6c16b1e5@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1581189938752
  ]
}

@vhatsura vhatsura marked this pull request as ready for review February 8, 2020 19:43
Copy link
Contributor

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vhatsura, great progress here!

I merged the net46 PR into the Elastic.Apm.Test project, so you can rebase this PR on master, with that the Elastic.Apm.SqlClient.Tests test should also compile.

try to enable instrumentation engine and check SQL command text

I think the situation here is that Application Insights has a feature that is based on byte code instrumentation and they simply inject code into the SqlClient library to access the CommandText - so I assume it's not like they turn on anything, so the SqlClient and related classes emit more diagnostic data, instead they inject byte code into the library to access the data.

Unfortunately, it's not possible to capture command text with an event listener.

I saw a couple of fixes around this and I got this working with "Microsoft.Data.SqlClient" Version="1.1.1".

image

One is this - I assume this will be also included in newer System.Data.SqlClient releases.

@@ -9,13 +10,20 @@ public class SqlClientDiagnosticSubscriber : IDiagnosticsSubscriber
public IDisposable Subscribe(IApmAgent agentComponents)
{
var retVal = new CompositeDisposable();
var initializer = new DiagnosticInitializer(agentComponents.Logger, new[] { new SqlClientDiagnosticListener(agentComponents) });
if (PlatformDetection.IsDotNetCore)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we totally sure about this? I tried to research and understand this and this seems to be correct, but is it always the case that on Full Framework all SqlClient versions emit through EventListener and on .NET Core through DiagnosticSource?

Do you know anything about this?

Copy link
Contributor Author

@vhatsura vhatsura Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what I see in ApplicationInsights repo. EventListener is used only in net45 target. DiagnosticSource is used in both - net45 and netstandard as well as for HttpClient.

As I understand correctly the comment

// NET45 referencing .net core System.Net.Http supports diagnostic listener

it's possible to reference .Net Core version of HttpClient in .Net Framework. If it's so, it can be also possible to do it with SqlClient. In such case, we need to listen DiagnosticSource in .Net Framework too. I've played around it a little bit but without any success. Maybe you have any clue how it's possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the info.

Well, to me that comment in the AI code is a but cryptic :) I know that HttpClient was updated outside the framework with a NuGet pacakge - maybe that's what they mean by "referencing .NET Core System.Net.Http" - I suspect that package was built from the prior CoreFX repo - same for .NET Core.

There isn't really such thing for SqlClient (unless you manually copy assemblies) - or actually Microsoft.Data.SqlClient would be similar, but that seems to use EventListener on Full Framework - which surprised me - I though it'd use DiagnosticSource on both.

Is that maybe why Application Insights subscribe to DiagnosticSource on both? I was unable to get DiagnosticSource out of this on Full Framework.

Nevertheless this is I'd say a minor thing. I think the current code is fine - if someone runs into a situation where we don't turn on the right stuff we can easily find that this is the root cause and we can adapt this part of the code.

What do you think? Keep it as it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that HttpClient was updated outside the framework with a NuGet pacakge - maybe that's what they mean by "referencing .NET Core System.Net.Http"

I also thought so, however, when I tried to do it, there was loaded usual NetFx version of HttpClient.

I though it'd use DiagnosticSource on both

As I understand correctly, DiagnoticSource is built-in in .Net Core runtime, and as for .Net Framework, it can be added only from the NuGet package. It seems it's one of the main reasons, why DiagnosticSource isn't widely used in .Net Framework.

What do you think? Keep it as it is?

I suggest keeping it as is, at least for now.

Copy link
Contributor

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more comments - sorry I wanted to send all at once, I just accidentally sent stuff previously.


protected override void OnEventSourceCreated(EventSource eventSource)
{
if (eventSource != null && eventSource.Name == "Microsoft-AdoNet-SystemData")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some strange behavior around here. The OnEventSourceCreated method got triggered twice for me resulting with the following Paylaod in the OnEventWritten method:

[0] = "ERROR: Exception in Command Processing for EventSource Microsoft-AdoNet-SystemData: An instance of EventSource with Guid 6a4dfe53-eb50-5332-8473-7b7e10a94fd1 already exists."

So, in the OnEventWritten we never did anything since the payload was not something we expect there.

My hack to make that work was to disable the eventsource and enable it again - I don't know why this happens and I don't really like this code - I'm not sure what should be the solution here.

if (eventSource != null && eventSource.Name == "Microsoft-AdoNet-SystemData")
{
	if (_eventSource != null)
	{
		DisableEvents(_eventSource);
		_eventSource.Dispose();
	}

	EnableEvents(eventSource, EventLevel.Informational, (EventKeywords)1);
	_eventSource = eventSource;
}

(_eventSource is a field on SqlEventListener)

One suspect is that both System.Data and Microsoft.Data triggers this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your findings. I'll look deeper into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregkalapos, your assumption is right. There are 2 event sources with the same GUIDs (GUID is generated based on event source name), one from System.Data.SqlClient and another one from Microsoft.Data.SqlClient.
This is the exact place where the exception is thrown: https://github.com/microsoft/referencesource/blob/08b84d13e81cfdbd769a557b368539aac6a9cb30/mscorlib/system/diagnostics/eventing/eventsource.cs#L3029-L3033
With this place, it's understandable why disposing of previous event source help to solve the issue, however, it's quite unstable, sometimes it doesn't help.
I've created an issue in Microsoft.Data.SqlClient repo. Will look, maybe they have a proper solution.

For now, I suggest keeping it as is with small hacking for our tests because I don't think that a lot of people use both libraries in one application (maybe I'm wrong). WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest keeping it as is with small hacking for our tests because I don't think that a lot of people use both libraries in one application

Yes, I agree that the point of this discussion is not a very "real world" scenario - I don't think either people would have both in 1 app.

On the other hand - what's the small hack for the tests? Did you already push something for this? 🤔 Currently the tests on full framework fail to me due to this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I haven't committed the hack yet. An idea is pretty simple. As SqlEventSource initializes statically the only one possible approach is to have three solutions. One with a base abstract class which will have test definitions, another one with referencing System.Data.SqlClient and the last one with Microsoft.Data.SqlClient.
I know, it's not a very good solution, but we will have it until version 2.0.0 released with a fix for event source name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregkalapos, I tried to implement solution with 3 projects but it also doesn't work. The issue is the SqlEventSource for System.Data.SqlClient is situated in the System.Data assembly. It means that SqlEventSource is always instantiated even if we don't reference System.Data.SqlClient.
So, my final suggestion (I believe) is to skip instrumentation for Microsoft.Data.SqlClient in .Net Framework. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my final suggestion (I believe) is to skip instrumentation for Microsoft.Data.SqlClient in .Net Framework. WDYT?

Ok, fine by me.

Overall, I think having 3 projects wouldn't even be a hack, more like a legitimate solution - it's unfortunate that it does not work.

Anyways, let's skip it for Full Framework and see if someone needs it in the future.

Thanks for opening the issue in the SqClient repo.

Nice to have the response there, but having yet another name in the future (as the comment says) makes this thing super complicated - anyways this is outside our control, so we use whatever they offer...

Copy link
Contributor Author

@vhatsura vhatsura Feb 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but having yet another name in the future (as the comment says) makes this thing super complicated

Yes, if it had added from the first release of Microsoft.Data.SqlClient, it would have been simpler to integrate with it as it was done for SQL clients in .Net Core. I want to believe the contracts for events between two SQL clients will be the same

Copy link
Contributor

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @vhatsura!

I think this is good - just 1 minor thing here and this discussion is still open I guess - at least for me tests still fail. Any info on that?


protected override void OnEventSourceCreated(EventSource eventSource)
{
if (eventSource != null && eventSource.Name == "Microsoft-AdoNet-SystemData")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest keeping it as is with small hacking for our tests because I don't think that a lot of people use both libraries in one application

Yes, I agree that the point of this discussion is not a very "real world" scenario - I don't think either people would have both in 1 app.

On the other hand - what's the small hack for the tests? Did you already push something for this? 🤔 Currently the tests on full framework fail to me due to this problem.

@gregkalapos
Copy link
Contributor

Oh, and I think we'll also need Microsoft.NETFramework.ReferenceAssemblies in the Elastic.Apm.SqlClient.Tests project in order to build on Linux.

[2020-02-21T22:24:53.931Z] /usr/share/dotnet/sdk/3.1.100/Microsoft.Common.CurrentVersion.targets(1175,5): error MSB3644: The reference assemblies for .NETFramework,Version=v4.6.1 were not found. To resolve this, install the Developer Pack (SDK/Targeting Pack) for this framework version or retarget your application. You can download .NET Framework Developer Packs at https://aka.ms/msbuild/developerpacks [/var/lib/jenkins/workspace/tnet_apm-agent-dotnet-mbp_PR-704/apm-agent-dotnet/test/Elastic.Apm.SqlClient.Tests/Elastic.Apm.SqlClient.Tests.csproj]
[2020-02-21T22:24:53.931Z]     2 Warning(s)
[2020-02-21T22:24:53.931Z]     1 Error(s)
[2020-02-21T22:24:53.931Z] 
[2020-02-21T22:24:53.931Z] Time Elapsed 00:01:20.51

@vhatsura
Copy link
Contributor Author

vhatsura commented Feb 22, 2020

Oh, and I think we'll also need Microsoft.NETFramework.ReferenceAssemblies in the Elastic.Apm.SqlClient.Tests project in order to build on Linux.

[2020-02-21T22:24:53.931Z] /usr/share/dotnet/sdk/3.1.100/Microsoft.Common.CurrentVersion.targets(1175,5): error MSB3644: The reference assemblies for .NETFramework,Version=v4.6.1 were not found. To resolve this, install the Developer Pack (SDK/Targeting Pack) for this framework version or retarget your application. You can download .NET Framework Developer Packs at https://aka.ms/msbuild/developerpacks [/var/lib/jenkins/workspace/tnet_apm-agent-dotnet-mbp_PR-704/apm-agent-dotnet/test/Elastic.Apm.SqlClient.Tests/Elastic.Apm.SqlClient.Tests.csproj]
[2020-02-21T22:24:53.931Z]     2 Warning(s)
[2020-02-21T22:24:53.931Z]     1 Error(s)
[2020-02-21T22:24:53.931Z] 
[2020-02-21T22:24:53.931Z] Time Elapsed 00:01:20.51

I moved reference to Microsoft.NETFramework.ReferenceAssemblies to root directory build properties as samples and source code already had such reference in their build props. I think it will fix all such issues 😃

@vhatsura vhatsura requested a review from gregkalapos February 22, 2020 10:00
@gregkalapos
Copy link
Contributor

Oh, and I think we'll also need Microsoft.NETFramework.ReferenceAssemblies in the Elastic.Apm.SqlClient.Tests project in order to build on Linux.

[2020-02-21T22:24:53.931Z] /usr/share/dotnet/sdk/3.1.100/Microsoft.Common.CurrentVersion.targets(1175,5): error MSB3644: The reference assemblies for .NETFramework,Version=v4.6.1 were not found. To resolve this, install the Developer Pack (SDK/Targeting Pack) for this framework version or retarget your application. You can download .NET Framework Developer Packs at https://aka.ms/msbuild/developerpacks [/var/lib/jenkins/workspace/tnet_apm-agent-dotnet-mbp_PR-704/apm-agent-dotnet/test/Elastic.Apm.SqlClient.Tests/Elastic.Apm.SqlClient.Tests.csproj]
[2020-02-21T22:24:53.931Z]     2 Warning(s)
[2020-02-21T22:24:53.931Z]     1 Error(s)
[2020-02-21T22:24:53.931Z] 
[2020-02-21T22:24:53.931Z] Time Elapsed 00:01:20.51

I moved reference to Microsoft.NETFramework.ReferenceAssemblies to root directory build properties as samples and source code already had such reference in their build props. I think it will fix all such issues 😃

Cool, nice 👍 .

We have new test fails in CI:

Error
Expected span.Context.Destination.Address to be 
"localhost" with a length of 9, but 
"172.17.0.3" has a length of 10, differs near "172" (index 0).

Also, I'm not totally sure this project properly runs in net461 - we should also look into that, but I don't see that as a blocker for the PR - if it needs more CI adaption we can do it later - Given that this technically should run on a Windows machine that should spawn a SQL Server Linux container - I'm not sure we can do that currently.

@vhatsura
Copy link
Contributor Author

vhatsura commented Feb 22, 2020

Also, I'm not totally sure this project properly runs in net461

Yes, I see that only tests from Elastic.Apm.Tests are included to run on net461. However, it seems there is also an issue with that:

A total of 1 test files matched the specified pattern.

[2020-02-22T13:14:55.618Z] Logging TestHost Diagnostics in file: C:\Users\jenkins\workspace\tnet_apm-agent-dotnet-mbp_PR-704\apm-agent-dotnet\target\diag.host.20-02-22_13-14-54_52382_1.log

[2020-02-22T13:15:00.895Z] [xUnit.net 00:00:04.3089285] Skipping: Elastic.Apm.Tests (could not find dependent assembly 'Elastic.Apm, Version=1.3.0')

[2020-02-22T13:15:00.895Z] No test is available in C:\Users\jenkins\workspace\tnet_apm-agent-dotnet-mbp_PR-704\apm-agent-dotnet\test\Elastic.Apm.Tests\bin\Debug\net461\Elastic.Apm.Tests.dll. Make sure that test discoverer & executors are registered and platform & framework version settings are appropriate and try again.

We have new test fails in CI:

I finally found the place, where I can see test failures. I tried to fix it but I need to restart CI from you

@gregkalapos
Copy link
Contributor

gregkalapos commented Feb 22, 2020

Also, I'm not totally sure this project properly runs in net461

Yes, I see that only tests from Elastic.Apm.Tests are included to run on net461. However, it seems there is also an issue with that:

A total of 1 test files matched the specified pattern.

[2020-02-22T13:14:55.618Z] Logging TestHost Diagnostics in file: C:\Users\jenkins\workspace\tnet_apm-agent-dotnet-mbp_PR-704\apm-agent-dotnet\target\diag.host.20-02-22_13-14-54_52382_1.log

[2020-02-22T13:15:00.895Z] [xUnit.net 00:00:04.3089285] Skipping: Elastic.Apm.Tests (could not find dependent assembly 'Elastic.Apm, Version=1.3.0')

[2020-02-22T13:15:00.895Z] No test is available in C:\Users\jenkins\workspace\tnet_apm-agent-dotnet-mbp_PR-704\apm-agent-dotnet\test\Elastic.Apm.Tests\bin\Debug\net461\Elastic.Apm.Tests.dll. Make sure that test discoverer & executors are registered and platform & framework version settings are appropriate and try again.

We have new test fails in CI:

I finally found the place, where I can find it. I tried to fix it but I need to restart CI from you

I'll make sure the CI runs for your changes now - I'll take a look at the remaining stuff later.
update: green ✅

Copy link
Contributor

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great - thanks!

@gregkalapos gregkalapos merged commit 2df5d6b into elastic:master Feb 24, 2020
@vhatsura vhatsura deleted the listener/sql/framework branch February 25, 2020 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants