Skip to content

Commit

Permalink
# Summary
Browse files Browse the repository at this point in the history
Added event sampling for high throughput stats collectors to reduce memory footprint, should hopefully resolve [issue #6](#6).

# Changes
- `JitStatsCollector`, `ThreadPoolSchedulingStatsCollector` and `ContentionStatsCollector` can now be configured to discard a ratio of start/ stop events, reducing the number of events we need to store in a `Cache` at any one time
- A new default sampling rate has been configured for the above collectors (see README)
- Updated README with example on how to configure sampling and details of the defaults for each stats collector
  • Loading branch information
djluck committed Jul 25, 2019
1 parent e11648f commit 7ecf436
Show file tree
Hide file tree
Showing 18 changed files with 362 additions and 55 deletions.
31 changes: 28 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ These metrics are essential for understanding the peformance of any non-trivial
Add the packge from [nuget](https://www.nuget.org/packages/prometheus-net.DotNetRuntime):
```powershell
# If you're using v2 of prometheus-net
dotnet add package prometheus-net.DotNetRuntime --version 2.0.8-beta
dotnet add package prometheus-net.DotNetRuntime --version 2.1.0-beta
# If you're using v3 of prometheus-net
dotnet add package prometheus-net.DotNetRuntime --version 3.0.8-beta
dotnet add package prometheus-net.DotNetRuntime --version 3.1.0-beta
```

And then start the collector:
Expand Down Expand Up @@ -49,7 +49,32 @@ The metrics exposed can drive a rich dashboard, giving you a graphical insight i
The harder you work the .NET core runtime, the more events it generates. Event generation and processing costs can stack up, especially around these types of events:
- **JIT stats**: each method compiled by the JIT compiler emits two events. Most JIT compilation is performed at startup and depending on the size of your application, this could impact your startup performance.
- **GC stats**: every 100KB of allocations, an event is emitted. If you are consistently allocating memory at a rate > 1GB/sec, you might like to disable GC stats.
- **.NET thread pool scheduling stats**: For every work item scheduled on the thread pool, two events are emitted. If you are scheduling thousands of items per second on the thread pool, you might like to disable scheduling events.
- **.NET thread pool scheduling stats**: For every work item scheduled on the thread pool, two events are emitted. If you are scheduling thousands of items per second on the thread pool, you might like to disable scheduling events or decrease the sampling rate of these events.

### Sampling
To counteract some of the performance impacts of measuring .NET core runtime events, sampling can be configured on supported collectors:
```
IDisposable collector = DotNetRuntimeStatsBuilder.Customize()
// Only 1 in 10 contention events will be sampled
.WithContentionStats(sampleRate: SampleEvery.TenEvents)
// Only 1 in 100 JIT events will be sampled
.WithJitStats(sampleRate: SampleEvery.HundredEvents)
// Every event will be sampled (disables sampling)
.WithThreadPoolSchedulingStats(sampleRate: SampleEvery.OneEvent)
.StartCollecting();
```

The default sample rates are listed below:
Event collector | Default sample rate
------------------------------ | -------------------
`ThreadPoolSchedulingStats` | `SampleEvery.TenEvents`
`JitStats` | `SampleEvery.TenEvents`
`ContentionStats` | `SampleEvery.TwoEvents`

While the default sampling rates provide a decent balance between accuracy and resource consumption if you're concerned with the accuracy of metrics at all costs,
then feel free to change the sampling rate to `SampleEvery.OneEvent`. If minimal resource consumption (especially memory), is your goal you might like to
reduce the sampling rate.


## Further reading
- The mechanism for listening to runtime events is outlined in the [.NET core 2.2 release notes](https://docs.microsoft.com/en-us/dotnet/core/whats-new/dotnet-core-2-2#core).
Expand Down
2 changes: 1 addition & 1 deletion src/Benchmarks/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ public class TestBenchmark
[MethodImpl(MethodImplOptions.NoOptimization)]
public long TestInterlockedIncLong() => Interlocked.Increment(ref l1);

private EventPairTimer<int> timer = new EventPairTimer<int>(1, 2, x => x.EventId);
private EventPairTimer<int> timer = new EventPairTimer<int>(1, 2, x => x.EventId, SampleEvery.OneEvent);

private EventWrittenEventArgs eventArgs;

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;
using NUnit.Framework;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ internal class ContentionStatsCollectorTests : StatsCollectorIntegrationTestBase
{
protected override ContentionStatsCollector CreateStatsCollector()
{
return new ContentionStatsCollector();
return new ContentionStatsCollector(SampleEvery.OneEvent);
}

[Test]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
using System;
using System.Diagnostics;
using System.Linq.Expressions;
using System.Runtime.CompilerServices;
using System.Threading;
using NUnit.Framework;
Expand All @@ -7,11 +10,11 @@

namespace Prometheus.DotNetRuntime.Tests.StatsCollectors.IntegrationTests
{
internal class JitStatsCollectorTests : StatsCollectorIntegrationTestBase<JitStatsCollector>
internal class Given_A_JitStatsCollector_That_Samples_Every_Jit_Event : StatsCollectorIntegrationTestBase<JitStatsCollector>
{
protected override JitStatsCollector CreateStatsCollector()
{
return new JitStatsCollector();
return new JitStatsCollector(SampleEvery.OneEvent);
}

[Test]
Expand Down Expand Up @@ -68,4 +71,37 @@ private int ToJitDynamic()
return o.Length;
}
}

internal class Given_A_JitStatsCollector_That_Samples_Every_Fifth_Jit_Event : StatsCollectorIntegrationTestBase<JitStatsCollector>
{
protected override JitStatsCollector CreateStatsCollector()
{
return new JitStatsCollector(SampleEvery.FiveEvents);
}

[Test]
public void When_many_methods_are_jitted_then_their_compilation_is_measured()
{
// arrange
var methodsJitted = StatsCollector.MethodsJittedTotal.Labels("true").Value;
var methodsJittedSeconds = StatsCollector.MethodsJittedSecondsTotal.Labels("true").Value;

// act
var sp = Stopwatch.StartNew();
Compile100Methods(() => 1);
sp.Stop();

// assert
Assert.That(() => StatsCollector.MethodsJittedTotal.Labels("true").Value, Is.GreaterThanOrEqualTo(methodsJitted + 20).After(100, 10));
Assert.That(StatsCollector.MethodsJittedSecondsTotal.Labels("true").Value, Is.GreaterThan(methodsJittedSeconds + sp.Elapsed.TotalSeconds).Within(0.1));
}

private void Compile100Methods(Expression<Func<int>> toCompile)
{
for (int i = 0; i < 100; i++)
{
toCompile.Compile();
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,12 @@
using System.Threading.Tasks;
using NUnit.Framework;
using Prometheus.DotNetRuntime.StatsCollectors;
using Prometheus.DotNetRuntime.StatsCollectors.Util;

namespace Prometheus.DotNetRuntime.Tests.StatsCollectors.IntegrationTests
{
[TestFixture]
internal class ThreadPoolSchedulingStatsCollectorTests : StatsCollectorIntegrationTestBase<ThreadPoolSchedulingStatsCollector>
internal class Given_A_ThreadPoolSchedulingStatsCollector_That_Samples_Every_Event : StatsCollectorIntegrationTestBase<ThreadPoolSchedulingStatsCollector>
{
protected override ThreadPoolSchedulingStatsCollector CreateStatsCollector()
{
Expand All @@ -34,4 +35,35 @@ public async Task When_work_is_queued_on_the_thread_pool_then_the_queued_and_sch
Assert.That(StatsCollector.ScheduleDelay.CollectAllSumValues().Single(), Is.EqualTo(sp.Elapsed.TotalSeconds).Within(0.01));
}
}

[TestFixture]
internal class Given_A_ThreadPoolSchedulingStatsCollector_That_Samples_Fifth_Event : StatsCollectorIntegrationTestBase<ThreadPoolSchedulingStatsCollector>
{
protected override ThreadPoolSchedulingStatsCollector CreateStatsCollector()
{
return new ThreadPoolSchedulingStatsCollector(Constants.DefaultHistogramBuckets, SampleEvery.FiveEvents);
}

[Test]
public async Task When_many_items_of_work_is_queued_on_the_thread_pool_then_the_queued_and_scheduled_work_is_measured()
{
Assert.That(StatsCollector.ScheduledCount.Value, Is.EqualTo(0));

// act (Task.Run will execute the function on the thread pool)
// There seems to be either a bug in the implementation of .NET core or a bug in my understanding...
// First call to Task.Run triggers a queued event but not a queue event. For now, call twice
await Task.Run(() => 1 );

var sp = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
sp.Start();
await Task.Run(() => sp.Stop());
}

Assert.That(() => StatsCollector.ScheduledCount.Value, Is.GreaterThanOrEqualTo(100).After(100, 10));
Assert.That(StatsCollector.ScheduleDelay.CollectAllCountValues().Single(), Is.GreaterThanOrEqualTo(100));
Assert.That(StatsCollector.ScheduleDelay.CollectAllSumValues().Single(), Is.EqualTo(sp.Elapsed.TotalSeconds).Within(0.01));
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@
namespace Prometheus.DotNetRuntime.Tests.StatsCollectors.Util
{
[TestFixture]
public class EventPairTimerTests
public class Given_An_EventPairTimer_That_Samples_Every_Event : EventPairTimerBaseClass
{
private const int EventIdStart = 1, EventIdEnd = 2;
private EventPairTimer<long> _eventPairTimer;

[SetUp]
public void SetUp()
{
_eventPairTimer = new EventPairTimer<long>(EventIdStart, EventIdEnd, x => (long)x.Payload[0]);
_eventPairTimer = new EventPairTimer<long>(EventIdStart, EventIdEnd, x => (long)x.Payload[0], SampleEvery.OneEvent);
}

[Test]
Expand Down Expand Up @@ -88,8 +88,54 @@ public void TryGetEventPairDuration_calculates_duration_between_multiple_out_of_
Assert.That(event2Duration.TotalMilliseconds, Is.EqualTo(200));
Assert.That(event3Duration.TotalMilliseconds, Is.EqualTo(100));
}
}


[TestFixture]
public class Given_An_EventPairTimer_That_Samples_Every_2nd_Event : EventPairTimerBaseClass
{

private EventPairTimer<long> _eventPairTimer;

[SetUp]
public void SetUp()
{
_eventPairTimer = new EventPairTimer<long>(EventIdStart, EventIdEnd, x => (long)x.Payload[0], SampleEvery.TwoEvents);
}

[Test]
public void TryGetEventPairDuration_recognizes_start_events_that_will_be_discarded()
{
var startEvent1 = CreateEventWrittenEventArgs(EventIdStart, DateTime.UtcNow, payload: 1L);
Assert.That(_eventPairTimer.TryGetDuration(startEvent1, out var duration),Is.EqualTo(DurationResult.Start));
Assert.That(duration, Is.EqualTo(TimeSpan.Zero));
}

[Test]
public void TryGetEventPairDuration_will_discard_1_event_and_calculate_duration_for_the_2nd_event()
{
// arrange
var now = DateTime.UtcNow;
var startEvent1 = CreateEventWrittenEventArgs(EventIdStart, now, payload: 1L);
var endEvent1 = CreateEventWrittenEventArgs(EventIdEnd, now.AddMilliseconds(300), payload: 1L);
var startEvent2 = CreateEventWrittenEventArgs(EventIdStart, now, payload: 2L);
var endEvent2 = CreateEventWrittenEventArgs(EventIdEnd, now.AddMilliseconds(200), payload: 2L);

_eventPairTimer.TryGetDuration(startEvent1, out var _);
_eventPairTimer.TryGetDuration(startEvent2, out var _);

// act
Assert.That(_eventPairTimer.TryGetDuration(endEvent1, out var _), Is.EqualTo(DurationResult.FinalWithoutDuration));
Assert.That(_eventPairTimer.TryGetDuration(endEvent2, out var _), Is.EqualTo(DurationResult.FinalWithDuration));
}

private EventWrittenEventArgs CreateEventWrittenEventArgs(int eventId, DateTime? timestamp = null, params object[] payload)
}

public class EventPairTimerBaseClass
{
protected const int EventIdStart = 1, EventIdEnd = 2;

protected EventWrittenEventArgs CreateEventWrittenEventArgs(int eventId, DateTime? timestamp = null, params object[] payload)
{
var args = (EventWrittenEventArgs)typeof(EventWrittenEventArgs).CreateInstance(new []{ typeof(EventSource)}, Flags.NonPublic | Flags.Instance, new object[] { null});
args.SetPropertyValue("EventId", eventId);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
using System;
using NUnit.Framework;
using Prometheus.DotNetRuntime.StatsCollectors.Util;

namespace Prometheus.DotNetRuntime.Tests.StatsCollectors.Util
{
[TestFixture]
public class SamplingRateTests
{
[Test]
[TestCase(SampleEvery.OneEvent, 1)]
[TestCase(SampleEvery.TwoEvents, 2)]
[TestCase(SampleEvery.FiveEvents, 5)]
[TestCase(SampleEvery.TenEvents, 10)]
[TestCase(SampleEvery.TwentyEvents, 20)]
[TestCase(SampleEvery.FiftyEvents, 50)]
[TestCase(SampleEvery.HundredEvents, 100)]
public void SampleEvery_Reflects_The_Ratio_Of_Every_100_Events_That_Will_Be_Sampled(SampleEvery samplingRate, int expected)
{
var sr = new SamplingRate(samplingRate);
Assert.That(sr.SampleEvery, Is.EqualTo(expected));
}

[Test]
[TestCase(SampleEvery.OneEvent, 1000)]
[TestCase(SampleEvery.TwoEvents, 500)]
[TestCase(SampleEvery.FiveEvents, 200)]
[TestCase(SampleEvery.TenEvents, 100)]
[TestCase(SampleEvery.TwentyEvents, 50)]
[TestCase(SampleEvery.FiftyEvents, 20)]
[TestCase(SampleEvery.HundredEvents, 10)]
public void Given_1000_Events_ShouldSampleEvent_Returns_True_Every_Nth_Event(SampleEvery samplingRate, int expectedEvents)
{
var eventsSampled = 0;
var sr = new SamplingRate(samplingRate);

for (int i = 0; i < 1_000; i++)
{
if (sr.ShouldSampleEvent())
{
eventsSampled++;
}
}

Assert.That(eventsSampled, Is.EqualTo(expectedEvents));
}
}
}
29 changes: 22 additions & 7 deletions src/prometheus-net.DotNetRuntime/DotNetRuntimeStatsBuilder.cs
Original file line number Diff line number Diff line change
Expand Up @@ -69,15 +69,20 @@ public IDisposable StartCollecting()
return runtimeStatsCollector;
}


/// <summary>
/// Include metrics around the volume of work scheduled on the worker thread pool
/// and the scheduling delays.
/// </summary>
/// <param name="histogramBuckets">Buckets for the scheduling delay histogram</param>
public Builder WithThreadPoolSchedulingStats(double[] histogramBuckets = null)
/// <param name="sampleRate">
/// The sampling rate for thread pool scheduling events. A lower sampling rate reduces memory use
/// but reduces the accuracy of metrics produced (as a percentage of events are discarded).
/// If your application achieves a high level of throughput (thousands of work items scheduled per second on
/// the thread pool), it's recommend to reduce the sampling rate even further.
/// </param>
public Builder WithThreadPoolSchedulingStats(double[] histogramBuckets = null, SampleEvery sampleRate = SampleEvery.TenEvents)
{
StatsCollectors.Add(new ThreadPoolSchedulingStatsCollector(histogramBuckets ?? Constants.DefaultHistogramBuckets));
StatsCollectors.Add(new ThreadPoolSchedulingStatsCollector(histogramBuckets ?? Constants.DefaultHistogramBuckets, sampleRate));
return this;
}

Expand All @@ -94,19 +99,29 @@ public Builder WithThreadPoolStats()
/// <summary>
/// Include metrics around volume of locks contended.
/// </summary>
public Builder WithContentionStats()
/// <param name="sampleRate">
/// The sampling rate for contention events (defaults to 100%). A lower sampling rate reduces memory use
/// but reduces the accuracy of metrics produced (as a percentage of events are discarded).
/// </param>
public Builder WithContentionStats(SampleEvery sampleRate = SampleEvery.TwoEvents)
{
StatsCollectors.Add(new ContentionStatsCollector());
StatsCollectors.Add(new ContentionStatsCollector(sampleRate));
return this;
}

/// <summary>
/// Include metrics summarizing the volume of methods being compiled
/// by the Just-In-Time compiler.
/// </summary>
public Builder WithJitStats()
/// <param name="sampleRate">
/// The sampling rate for JIT events. A lower sampling rate reduces memory use
/// but reduces the accuracy of metrics produced (as a percentage of events are discarded).
/// If your application achieves a high level of throughput (thousands of work items scheduled per second on
/// the thread pool), it's recommend to reduce the sampling rate even further.
/// </param>
public Builder WithJitStats(SampleEvery sampleRate = SampleEvery.TenEvents)
{
StatsCollectors.Add(new JitStatsCollector());
StatsCollectors.Add(new JitStatsCollector(sampleRate));
return this;
}

Expand Down
Loading

0 comments on commit 7ecf436

Please sign in to comment.