Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the new 9.1 OpenTelemetry meters #6719

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions Snippets/Core/Core.sln
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Core_8.1", "Core_8.1\Core_8
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Core_9", "Core_9\Core_9.csproj", "{5CE08C72-7A6B-44B8-8CC2-4D787D30A155}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Core_9.1", "Core_9.1\Core_9.1.csproj", "{9D189D02-8AF6-4B92-ABFC-224246946917}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand All @@ -38,6 +40,10 @@ Global
{5CE08C72-7A6B-44B8-8CC2-4D787D30A155}.Debug|Any CPU.Build.0 = Debug|Any CPU
{5CE08C72-7A6B-44B8-8CC2-4D787D30A155}.Release|Any CPU.ActiveCfg = Release|Any CPU
{5CE08C72-7A6B-44B8-8CC2-4D787D30A155}.Release|Any CPU.Build.0 = Release|Any CPU
{9D189D02-8AF6-4B92-ABFC-224246946917}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{9D189D02-8AF6-4B92-ABFC-224246946917}.Debug|Any CPU.Build.0 = Debug|Any CPU
{9D189D02-8AF6-4B92-ABFC-224246946917}.Release|Any CPU.ActiveCfg = Release|Any CPU
{9D189D02-8AF6-4B92-ABFC-224246946917}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand Down
23 changes: 23 additions & 0 deletions Snippets/Core/Core_9.1/Core_9.1.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFramework>net8.0</TargetFramework>
<RootNamespace>Core9</RootNamespace>
</PropertyGroup>

<ItemGroup>
<FrameworkReference Include="Microsoft.AspNetCore.App" />
</ItemGroup>

<ItemGroup>
<PackageReference Include="NServiceBus" Version="9.1.*" />
<PackageReference Include="NServiceBus.Callbacks" Version="5.*" />
<PackageReference Include="NServiceBus.Encryption.MessageProperty" Version="5.*" />
<PackageReference Include="NServiceBus.Metrics.ServiceControl" Version="5.*" />
<PackageReference Include="Newtonsoft.Json" Version="13.*" />
<PackageReference Include="NUnit" Version="3.*" />
<PackageReference Include="NUnit3TestAdapter" Version="3.17.0" />
<PackageReference Include="OpenTelemetry" Version="1.3.0" />
</ItemGroup>

</Project>
22 changes: 11 additions & 11 deletions monitoring/metrics/definitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,42 +15,42 @@ Gathering metrics is important to know how a system works and if it works proper

NServiceBus and ServiceControl capture a number of different metrics about a running endpoint including the processing time, the number of messages in each queue (differentiating between those pulled from the queue, those processed successfully, and those which failed processing), as well as "critical time".


### Processing time

Processing time is the time it takes for an endpoint to **successfully** invoke all handlers and sagas for a single incoming message. Processing failures are not included in the processing time metric.
Processing time is the time it takes for an endpoint to **successfully** process an incoming message. It includes:

- Invoking all handlers and sagas for a single incoming message
- Invoking the incoming message processing pipeline, which includes steps like deserialization or user defined pipeline behaviors.

Processing failures are not included in the processing time metric.

> [!NOTE]
> Processing time does not include the time to store the outbox operation, transmit outgoing messages to the transport, fetch the incoming message, and complete the incoming message (i.e. commit the transport transaction or acknowledge the message).


### Number of messages pulled from queue

This metric measures the total number of messages that the endpoint reads from its input queue regardless of whether message processing succeeds or fails.


### Number of messages successfully processed

This metric measures the total number of messages that the endpoint successfully processes. In order for a message to be counted by this metric, all handlers must have executed without throwing an exception.


### Number of message processing failures

This metric measures the total number of messages that the endpoint has failed to process. In order for a message to be counted by this metric, at least one handler must have thrown an exception.


### Critical time

Critical time is the time between when a message is sent and when it is fully processed. It is a combination of:

* Network send time: The time a message spends on the network before arriving in the destination queue
* Queue wait time: The time a message spends in the destination queue before being picked up and processed
* Processing time: The time it takes for the destination endpoint to process the message
- Network send time: The time a message spends on the network before arriving in the destination queue
- Queue wait time: The time a message spends in the destination queue before being picked up and processed
- Processing time: The time it takes for the destination endpoint to process the message

Critical time does _not_ include:

* The time to store the outbox operation, transmit messages to the transport, and complete the incoming message (i.e. commit the transport transaction or acknowledge) because the `TimeSent` header is added with the current time during the dispatch phase, after the outbox operation has completed.
* The time a delayed message is held by a timeout mechanism. (NServiceBus version 7.7 and above.)
- The time to store the outbox operation, transmit messages to the transport, and complete the incoming message (i.e. commit the transport transaction or acknowledge) because the `TimeSent` header is added with the current time during the dispatch phase, after the outbox operation has completed.
- The time a delayed message is held by a timeout mechanism. (NServiceBus version 7.7 and above.)

> [!NOTE]
> Due to the fact that the critical time is calculated based on timestamps taken on two different machines (the sender and the receiver of a message), it is affected by the [clock drift problem](https://en.wikipedia.org/wiki/Clock_drift). In cases where the clocks of the machines differ significantly, the critical time may be reported as a negative value. Use well-known clock synchronization solutions such as NTP to mitigate the issue.
Expand Down
67 changes: 66 additions & 1 deletion nservicebus/operations/opentelemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,75 @@ title: OpenTelemetry
summary: Observability of NServiceBus endpoints with OpenTelemetry
component: core
reviewed: 2022-06-29
versions: '[8,)'
related:
- samples/open-telemetry
---

NServiceBus version 8 and above supports [OpenTelemetry](https://opentelemetry.io/docs/instrumentation/net/) through traces, metrics, and logging.

partial: content
Enable OpenTelemetry instrumentation in NServiceBus:

snippet: opentelemetry-enableinstrumentation

With OpenTelemetry instrumentation enabled, tracing, metrics, and logging can be individually configured via the OpenTelemetry API itself.

## Traces

NServiceBus endpoints generate OpenTelemetry traces for incoming and outgoing messages. To capture trace information, add the `NServiceBus.Core` activity source to the OpenTelemetry configuration:

snippet: opentelemetry-enabletracing

See the [OpenTelemetry samples](/samples/open-telemetry/) for instructions on how to send trace information to different tools.

## Meters

NServiceBus endpoints can be configured to expose metrics related to message processing. To capture meter information, add the `NServiceBus.Core` meter source to the OpenTelemetry configuration:

snippet: opentelemetry-enablemeters

partial: meters

See the [OpenTelemetry samples](/samples/open-telemetry/) for instructions on how to send metric information to different tools.

## Logging

NServiceBus supports logging out of the box. To collect OpenTelemetry-compatible logging in NServiceBus endpoints, it's possible to configure the endpoint to connect traces and logging when using `Microsoft.Extensions.Logging` package. See the [_Connecting OpenTelemetry traces and logs_ sample](/samples/open-telemetry/logging) for more details.

## Alignment of host identifier

It is recommended to align the instance identifier between NServiceBus and OpenTelemetry so all logs, metrics, traces and audit messages can be correlated by a host (instance) if needed.

> [!NOTE]
> The OpenTelemetry specification recommends this to be a random uuid. However, it may also be a [deterministic uuid v5](https://opentelemetry.io/docs/specs/semconv/attributes-registry/service/#service-attributes) (i.e. hash of machine name and endpointname).

NServiceBus adds an [host identifier to all audit messages](/nservicebus/hosting/override-hostid.md) and this instance identifier is also used to show [performance metrics for each running instance in ServicePulse](/monitoring/metrics/in-servicepulse.md). The [instance identifier used for ServicePulse value can be overriden](/monitoring/metrics/install-plugin.md#configuration-instance-id).

OpenTelemetry also allows to customize the instance id used for `service.instance.id` in various ways.

Consider aligning the instance ID used by OpenTelemetry and ServiceControl metrics API.

#### Example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make legit snippet? (Not sure if this would be an 8 and 9 snippet or a 9.1 only snippet)

```c#
// Generate (deterministic) instance ID shared by all components

// Generate deterministic uuid v4 viahttps://github.com/Faithlife/FaithlifeUtility/blob/master/src/Faithlife.Utility/GuidUtility.cs
var deterministicValue = "MyEndpoint@" + Dns.GetHostName();
Guid serviceInstanceId = GuidUtility.Create(new Guid("4d63009a-8d0f-11ee-aad7-4c796ed8e320", deterministicValue)) // or Guid.NewGuid()

// OpenTelemetry
services.AddOpenTelemetry()
.ConfigureResource(rb => rb.AddService("MyService", serviceInstanceId: serviceInstanceId.ToString()))

// NServiceBus
endpointConfiguration.UniquelyIdentifyRunningInstance()
.UsingCustomDisplayName("original-instance");
.UsingCustomIdentifier(serviceInstanceId)

// ServiceControl Metrics
endpointConfiguration
.EnableMetrics()
// Not required when already set via UsingCustomIdentifier
.SendMetricDataToServiceControl("particular.monitoring", TimeSpan.FromMinutes(1), serviceInstanceId.ToString());
```
70 changes: 0 additions & 70 deletions nservicebus/operations/opentelemetry_content_core_[8,).partial.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Emitted meters

- `nservicebus.messaging.successes` - Total number of messages processed successfully by the endpoint
- `nservicebus.messaging.fetches` - Total number of messages fetched from the queue by the endpoint
- `nservicebus.messaging.failures` - Total number of messages processed unsuccessfully by the endpoint
13 changes: 13 additions & 0 deletions nservicebus/operations/opentelemetry_meters_core_[9.1,).partial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
### Emitted meters

- `nservicebus.messaging.successes` - Total number of messages processed successfully by the endpoint
- `nservicebus.messaging.fetches` - Total number of messages fetched from the queue by the endpoint
- `nservicebus.messaging.failures` - Total number of messages processed unsuccessfully by the endpoint
- `nservicebus.messaging.handler_time` - The time the user handling code takes to handle a message
- `nservicebus.messaging.processing_time` - The time the endpoint takes to process a message from when it's fetched from the input queue to when processing completes. It includes:
- Invoking all handlers and sagas for a single incoming message
- Invoking the incoming message processing pipeline, which includes steps like deserialization or user defined pipeline behaviors.
- `nservicebus.messaging.critical_time` - The time between when a message is sent and when it is fully processed. It is a combination of:
- Network send time: The time a message spends on the network before arriving in the destination queue
- Queue wait time: The time a message spends in the destination queue before being picked up and processed
- Processing time: The time it takes for the destination endpoint to process the message
Loading