Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust documentation for OpenTelemetry changes in NServiceBus 9.1 #6718

Merged
merged 23 commits into from
Jul 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
f9d97ff
restructure otel docs per signal
lailabougria Jun 25, 2024
cd2c468
add span structure
lailabougria Jun 25, 2024
77b4a80
adds snippets for new apis
lailabougria Jun 25, 2024
702daab
split snippet and fix name
lailabougria Jun 25, 2024
51126e3
fix naming in graphs
lailabougria Jun 25, 2024
ba31c7e
Fixed version ranges
SzymonPobiega Jun 26, 2024
95134e8
Add snippets folder for 9.1
SzymonPobiega Jun 26, 2024
d427a93
document that delayed messages are always linked
lailabougria Jun 27, 2024
64bfa1e
fix link
lailabougria Jun 27, 2024
36875a1
rename to metrics to match the otel signal
lailabougria Jun 27, 2024
1dad406
Fix the build
SzymonPobiega Jul 2, 2024
f0fed6b
Document the new 9.1 OpenTelemetry meters
mauroservienti Jun 25, 2024
b1cfbb6
Replace meters partials with metrics (same content)
SzymonPobiega Jun 27, 2024
b508c9a
Fix rebase
SzymonPobiega Jun 27, 2024
a9109b6
Adjust the docs for the critical and processing time
SzymonPobiega Jul 12, 2024
0e25ec3
Include the Outgoing messages dispatch time in the critical time.
saratry Jul 12, 2024
8133ec7
Apply suggestions from code review
SzymonPobiega Jul 15, 2024
b85400e
Add more metric definitions and links
SzymonPobiega Jul 15, 2024
6287a99
Add recoverability metrics
SzymonPobiega Jul 18, 2024
8412e6a
Convert host alignment inline code into a snippet
SzymonPobiega Jul 18, 2024
543ec25
Fix reference in snippets
SzymonPobiega Jul 19, 2024
92a546b
Fix notes
SzymonPobiega Jul 19, 2024
f2e5a4c
Fix the build
SzymonPobiega Jul 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions Snippets/Core/Core.sln
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Core_8.1", "Core_8.1\Core_8
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Core_9", "Core_9\Core_9.csproj", "{5CE08C72-7A6B-44B8-8CC2-4D787D30A155}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Core_9.1", "Core_9.1\Core_9.1.csproj", "{2FFFC06C-64E4-4A61-B3A0-71B8B4EBF041}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand All @@ -38,6 +40,10 @@ Global
{5CE08C72-7A6B-44B8-8CC2-4D787D30A155}.Debug|Any CPU.Build.0 = Debug|Any CPU
{5CE08C72-7A6B-44B8-8CC2-4D787D30A155}.Release|Any CPU.ActiveCfg = Release|Any CPU
{5CE08C72-7A6B-44B8-8CC2-4D787D30A155}.Release|Any CPU.Build.0 = Release|Any CPU
{2FFFC06C-64E4-4A61-B3A0-71B8B4EBF041}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{2FFFC06C-64E4-4A61-B3A0-71B8B4EBF041}.Debug|Any CPU.Build.0 = Debug|Any CPU
{2FFFC06C-64E4-4A61-B3A0-71B8B4EBF041}.Release|Any CPU.ActiveCfg = Release|Any CPU
{2FFFC06C-64E4-4A61-B3A0-71B8B4EBF041}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand Down
3 changes: 2 additions & 1 deletion Snippets/Core/Core_8/Core_8.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,11 @@
<PackageReference Include="NServiceBus.Callbacks" Version="4.*" />
<PackageReference Include="NServiceBus.Encryption.MessageProperty" Version="4.*" />
<PackageReference Include="NServiceBus.Metrics.ServiceControl" Version="4.*" />
<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.*" />
<PackageReference Include="Newtonsoft.Json" Version="13.*" />
<PackageReference Include="NUnit" Version="3.*" />
<PackageReference Include="NUnit3TestAdapter" Version="3.17.0" />
<PackageReference Include="OpenTelemetry" Version="1.3.0" />
<PackageReference Include="OpenTelemetry" Version="1.*" />
<Reference Include="System.ServiceProcess" />
<Reference Include="System.Web" />
</ItemGroup>
Expand Down
48 changes: 48 additions & 0 deletions Snippets/Core/Core_8/OpenTelemetry/HostIdentifier.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
using System;
using System.Net;
using Microsoft.Extensions.DependencyInjection;
using NServiceBus;
using OpenTelemetry.Resources;

namespace Core_8
{
class HostIdentifier
{
void Align(EndpointConfiguration endpointConfiguration, IServiceCollection services)
{
#region opentelemetry-align-host-id
// Generate instance ID shared by all components

// Generate deterministic uuid v4 via
// https://github.com/Faithlife/FaithlifeUtility/blob/master/src/Faithlife.Utility/GuidUtility.cs
var deterministicValue = "MyEndpoint@" + Dns.GetHostName();
Guid serviceInstanceId = GuidUtility.Create(deterministicValue); // or Guid.NewGuid()

// OpenTelemetry
services.AddOpenTelemetry()
.ConfigureResource(builder =>
builder.AddService("MyService", serviceInstanceId: serviceInstanceId.ToString()));

// NServiceBus
endpointConfiguration.UniquelyIdentifyRunningInstance()
.UsingCustomDisplayName("original-instance")
.UsingCustomIdentifier(serviceInstanceId);

// ServiceControl Metrics
endpointConfiguration
.EnableMetrics()
// Not required when already set via UsingCustomIdentifier
.SendMetricDataToServiceControl("particular.monitoring",
TimeSpan.FromMinutes(1), serviceInstanceId.ToString());
#endregion
}

class GuidUtility
{
public static Guid Create(string deterministicValue)
{
throw new NotImplementedException();
}
}
}
}
23 changes: 23 additions & 0 deletions Snippets/Core/Core_9.1/Core_9.1.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>Core9</RootNamespace>
</PropertyGroup>

<ItemGroup>
<FrameworkReference Include="Microsoft.AspNetCore.App" />
</ItemGroup>

<ItemGroup>
<PackageReference Include="NServiceBus" Version="9.1.*" />
<PackageReference Include="NServiceBus.Callbacks" Version="5.*" />
<PackageReference Include="NServiceBus.Encryption.MessageProperty" Version="5.*" />
<PackageReference Include="NServiceBus.Metrics.ServiceControl" Version="5.*" />
<PackageReference Include="Newtonsoft.Json" Version="13.*" />
<PackageReference Include="NUnit" Version="3.*" />
<PackageReference Include="NUnit3TestAdapter" Version="3.17.0" />
<PackageReference Include="OpenTelemetry" Version="1.3.0" />
</ItemGroup>

</Project>
35 changes: 35 additions & 0 deletions Snippets/Core/Core_9.1/OpenTelemetry/ManageTraceDepthUsage.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
using System.Threading.Tasks;
using NServiceBus;

namespace Core9_1;

public static class ManageTraceDepthUsage
{
async static Task RequestStartNewTrace(IPipelineContext context)
{
#region opentelemetry-sendoptions-start-new-trace
var options = new SendOptions();
options.StartNewTraceOnReceive();
var message = new MyMessage();
await context.Send(message, options);
#endregion
}

async static Task RequestContinueExistingTrace(IPipelineContext context)
{
#region opentelemetry-publishoptions-continue-trace
var options = new PublishOptions();
options.ContinueExistingTraceOnReceive();
var message = new MyEvent();
await context.Publish(message, options);
#endregion
}

class MyMessage
{
}

class MyEvent
{
}
}
3 changes: 2 additions & 1 deletion Snippets/Core/Core_9/Core_9.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,11 @@
<PackageReference Include="NServiceBus.Callbacks" Version="5.*" />
<PackageReference Include="NServiceBus.Encryption.MessageProperty" Version="5.*" />
<PackageReference Include="NServiceBus.Metrics.ServiceControl" Version="5.*" />
<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.*" />
<PackageReference Include="Newtonsoft.Json" Version="13.*" />
<PackageReference Include="NUnit" Version="3.*" />
<PackageReference Include="NUnit3TestAdapter" Version="3.17.0" />
<PackageReference Include="OpenTelemetry" Version="1.3.0" />
<PackageReference Include="OpenTelemetry" Version="1.*" />
</ItemGroup>

</Project>
48 changes: 48 additions & 0 deletions Snippets/Core/Core_9/OpenTelemetry/HostIdentifier.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
using System;
using System.Net;
using Microsoft.Extensions.DependencyInjection;
using NServiceBus;
using OpenTelemetry.Resources;

namespace Core_9
{
class HostIdentifier
{
void Align(EndpointConfiguration endpointConfiguration, IServiceCollection services)
{
#region opentelemetry-align-host-id
// Generate instance ID shared by all components

// Generate deterministic uuid v4 via
// https://github.com/Faithlife/FaithlifeUtility/blob/master/src/Faithlife.Utility/GuidUtility.cs
var deterministicValue = "MyEndpoint@" + Dns.GetHostName();
Guid serviceInstanceId = GuidUtility.Create(deterministicValue); // or Guid.NewGuid()

// OpenTelemetry
services.AddOpenTelemetry()
.ConfigureResource(builder =>
builder.AddService("MyService", serviceInstanceId: serviceInstanceId.ToString()));

// NServiceBus
endpointConfiguration.UniquelyIdentifyRunningInstance()
.UsingCustomDisplayName("original-instance")
.UsingCustomIdentifier(serviceInstanceId);

// ServiceControl Metrics
endpointConfiguration
.EnableMetrics()
// Not required when already set via UsingCustomIdentifier
.SendMetricDataToServiceControl("particular.monitoring",
TimeSpan.FromMinutes(1), serviceInstanceId.ToString());
#endregion
}

class GuidUtility
{
public static Guid Create(string deterministicValue)
{
throw new NotImplementedException();
}
}
}
}
41 changes: 30 additions & 11 deletions monitoring/metrics/definitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,42 +15,49 @@ Gathering metrics is important to know how a system works and if it works proper

NServiceBus and ServiceControl capture a number of different metrics about a running endpoint including the processing time, the number of messages in each queue (differentiating between those pulled from the queue, those processed successfully, and those which failed processing), as well as "critical time".

### Handler time

Handler time is the time each handler takes to perform the business logic. It is recorded for each handler separately. It includes serialization of outgoing messages. Handler time does not include any database operations managed by NServiceBus.

### Processing time

Processing time is the time it takes for an endpoint to **successfully** invoke all handlers and sagas for a single incoming message. Processing failures are not included in the processing time metric.
Processing time is the time it takes for an endpoint to **successfully** process an incoming message. It includes:

> [!NOTE]
> Processing time does not include the time to store the outbox operation, transmit outgoing messages to the transport, fetch the incoming message, and complete the incoming message (i.e. commit the transport transaction or acknowledge the message).
- The execution of all handlers and sagas for the incoming message
- The execution of the incoming message processing pipeline, which includes deserialization and where applicable, user-defined pipeline behaviors and saga loading and saving time.
- The storing of the outbox operations (if outbox is enabled)

Processing failures are not included in the processing time metric.

> [!NOTE]
> Processing time does not include fetching the incoming message, transmitting outgoing messages to the transport, and completing the incoming message (i.e. commmitting the transport transaction or acknowledging).

### Number of messages pulled from queue

This metric measures the total number of messages that the endpoint reads from its input queue regardless of whether message processing succeeds or fails.


### Number of messages successfully processed

This metric measures the total number of messages that the endpoint successfully processes. In order for a message to be counted by this metric, all handlers must have executed without throwing an exception.


### Number of message processing failures

This metric measures the total number of messages that the endpoint has failed to process. In order for a message to be counted by this metric, at least one handler must have thrown an exception.


### Critical time

Critical time is the time between when a message is sent and when it is fully processed. It is a combination of:

* Network send time: The time a message spends on the network before arriving in the destination queue
* Queue wait time: The time a message spends in the destination queue before being picked up and processed
* Processing time: The time it takes for the destination endpoint to process the message
- Committing the sender outbox transaction (if outbox is enabled)
- Network send time: The time a message spends on the network before arriving in the destination queue
- Queue wait time: The time a message spends in the destination queue before being picked up and processed
- Processing time: The time it takes for the destination endpoint to process the message
- Outgoing messages dispatch time: The time it takes transmitting outgoing messages to the transport

Critical time does _not_ include:

* The time to store the outbox operation, transmit messages to the transport, and complete the incoming message (i.e. commit the transport transaction or acknowledge) because the `TimeSent` header is added with the current time during the dispatch phase, after the outbox operation has completed.
* The time a delayed message is held by a timeout mechanism. (NServiceBus version 7.7 and above.)
- The time to complete the incoming message (i.e. commit the transport transaction or acknowledge)
- The time a delayed message is held by a timeout mechanism. (NServiceBus version 7.7 and above.)

> [!NOTE]
> Due to the fact that the critical time is calculated based on timestamps taken on two different machines (the sender and the receiver of a message), it is affected by the [clock drift problem](https://en.wikipedia.org/wiki/Clock_drift). In cases where the clocks of the machines differ significantly, the critical time may be reported as a negative value. Use well-known clock synchronization solutions such as NTP to mitigate the issue.
Expand All @@ -59,6 +66,18 @@ Critical time does _not_ include:

This metric measures the number of [retries](/nservicebus/recoverability) scheduled by the endpoint (immediate or delayed).

### Immdiate retries

This metric measures the number of [immediate retries](/nservicebus/recoverability/#immediate-retries) scheduled by the endpoint.

### Delayed retries

This metric measures the number of [delayed retries](/nservicebus/recoverability/#delayed-retries) scheduled by the endpoint.

### Moved to error queue

This metric measures the number of [messages moved to the error queue](/nservicebus/recoverability/#fault-handling).

### Queue length

This metric tracks the number of messages in the main input queue of an endpoint.
Expand Down
8 changes: 7 additions & 1 deletion nservicebus/operations/opentelemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,10 @@ related:

NServiceBus version 8 and above supports [OpenTelemetry](https://opentelemetry.io/docs/instrumentation/net/) through traces, metrics, and logging.

partial: content
partial: traces

partial: metrics

partial: logs

partial: host-identifier
70 changes: 0 additions & 70 deletions nservicebus/operations/opentelemetry_content_core_[8,).partial.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## Alignment of host identifier

It is recommended to align the instance identifier between NServiceBus and OpenTelemetry so all logs, metrics, traces and audit messages can be correlated by a host (instance) if needed.

> [!NOTE]
> The OpenTelemetry specification recommends this to be a random uuid. However, it may also be a [deterministic uuid v5](https://opentelemetry.io/docs/specs/semconv/attributes-registry/service/#service-attributes) (i.e. hash of machine name and endpointname).

NServiceBus adds a [host identifier to all audit messages](/nservicebus/hosting/override-hostid.md) and this instance identifier is also used to show [performance metrics for each running instance in ServicePulse](/monitoring/metrics/in-servicepulse.md). The [instance identifier used for ServicePulse value can be overriden](/monitoring/metrics/install-plugin.md#configuration-instance-id).

OpenTelemetry also allows to customize the instance id used for `service.instance.id` in various ways.

Consider aligning the instance ID used by OpenTelemetry and ServiceControl metrics API.

#### Example

snippet: opentelemetry-align-host-id
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Logging

NServiceBus supports logging out of the box. To collect OpenTelemetry-compatible logging in NServiceBus endpoints, it's possible to configure the endpoint to connect traces and logging when using `Microsoft.Extensions.Logging` package. See the [_Connecting OpenTelemetry traces and logs_ sample](/samples/open-telemetry/logging) for more details.
Loading