GCAllocationTick_V2 ETW events are reporting wrong allocation sizes in multithreaded application #45375

Temp1ar · 2020-11-30T15:58:17Z

Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:

using System.Linq;
using System.Threading;
class Program
{
  public static void Main(string[] args)
  {
    var threadsCount = args.Any() ? int.Parse(args[0]) : 4;
    var waitHandles = new WaitHandle[threadsCount];
    int counter = 1024;
    for (int i = 0; i != threadsCount; ++i)
    {
      var waitHandle = new AutoResetEvent(false);
      var thread = new Thread(() =>
      {
        while (Interlocked.Decrement(ref counter) > 0)
        {
          var _ = new byte[200 * 1024];
        }
        waitHandle.Set();
      });
      thread.Start();
      waitHandles[i] = waitHandle;
    }
    WaitHandle.WaitAll(waitHandles);
  }
}

If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:

However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:

Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?

The text was updated successfully, but these errors were encountered:

ghost · 2020-11-30T19:22:47Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:

using System.Linq;
using System.Threading;
class Program
{
  public static void Main(string[] args)
  {
    var threadsCount = args.Any() ? int.Parse(args[0]) : 4;
    var waitHandles = new WaitHandle[threadsCount];
    int counter = 1024;
    for (int i = 0; i != threadsCount; ++i)
    {
      var waitHandle = new AutoResetEvent(false);
      var thread = new Thread(() =>
      {
        while (Interlocked.Decrement(ref counter) > 0)
        {
          var _ = new byte[200 * 1024];
        }
        waitHandle.Set();
      });
      thread.Start();
      waitHandles[i] = waitHandle;
    }
    WaitHandle.WaitAll(waitHandles);
  }
}

If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:

However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:

Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?

Author:	Temp1ar
Assignees:	-
Labels:	`area-GC-coreclr`, `untriaged`
Milestone:	-

ghost · 2020-11-30T20:38:55Z

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:

using System.Linq;
using System.Threading;
class Program
{
  public static void Main(string[] args)
  {
    var threadsCount = args.Any() ? int.Parse(args[0]) : 4;
    var waitHandles = new WaitHandle[threadsCount];
    int counter = 1024;
    for (int i = 0; i != threadsCount; ++i)
    {
      var waitHandle = new AutoResetEvent(false);
      var thread = new Thread(() =>
      {
        while (Interlocked.Decrement(ref counter) > 0)
        {
          var _ = new byte[200 * 1024];
        }
        waitHandle.Set();
      });
      thread.Start();
      waitHandles[i] = waitHandle;
    }
    WaitHandle.WaitAll(waitHandles);
  }
}

If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:

However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:

Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?

Author:	Temp1ar
Assignees:	-
Labels:	`area-System.Diagnostics.Tracing`, `untriaged`
Milestone:	-

mangod9 · 2020-11-30T20:39:58Z

I believe the events might be lossy at higher scale, but adding Tracing folks to confirm

Temp1ar · 2020-12-01T00:19:04Z

I believe the events might be lossy at higher scale, but adding Tracing folks to confirm

Actually the number of events are the same: 1023 and 1024. It's AllocationAmount field value is incorrect for some of them.

Maoni0 · 2020-12-01T02:19:19Z

you have found a bug! the update of etw_allocation_running_amount is not under a lock so that's why it became inaccurate when you ran it multi-threaded. this has been this way for a long time but we don't have diagnostic tests that verify that the info we fire in ETW events is accurate. I think people who use this in general don't care a lot about the absolute accuracy - as long as it's good enough it helps them find problem they are fine with it. but this is a bug. etw_allocation_running_amount is declared as a PER_HEAP var but by the time we updated it (with an ordinary +) we already gave up the per heap lock.

we should fire this while we are still holding the lock.

ghost · 2020-12-01T05:25:40Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:

using System.Linq;
using System.Threading;
class Program
{
  public static void Main(string[] args)
  {
    var threadsCount = args.Any() ? int.Parse(args[0]) : 4;
    var waitHandles = new WaitHandle[threadsCount];
    int counter = 1024;
    for (int i = 0; i != threadsCount; ++i)
    {
      var waitHandle = new AutoResetEvent(false);
      var thread = new Thread(() =>
      {
        while (Interlocked.Decrement(ref counter) > 0)
        {
          var _ = new byte[200 * 1024];
        }
        waitHandle.Set();
      });
      thread.Start();
      waitHandles[i] = waitHandle;
    }
    WaitHandle.WaitAll(waitHandles);
  }
}

If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:

However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:

Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?

Author:	Temp1ar
Assignees:	-
Labels:	`area-GC-coreclr`, `untriaged`
Milestone:	-

mangod9 · 2020-12-01T05:25:47Z

Thanks @Temp1ar and @Maoni0. I have moved back to the gc area.

Maoni0 · 2021-07-12T20:12:58Z

I'll include this with my instrumentation changes.

We've never had ETW events that had fields about time - we rely on the the timestamps of the ETW events themselves to calculate time. This checkin introduces some new events/event fields that will include time info instead of firing individual events that otherwise don't carry much info, ie, they'd just be there so we could use their timestamps, which would be a nuisance when we have many heaps. The rationale behind firing events with time already calculated is 1) it does reduce overhead since we don't fire as many events so we can fire these in fewer events for informational level 2) firing individual vents and having the tools interpret them isn't very useful unlike events such as GCStart/GCEnd which can be correlated with other events (eg, you get GCStart, and then get a bunch of other events so you know those happened after a GC started) wheras things like very GC internal things don't have this property, ie, we are not gonna care that "these other events happened during a GC and specifically during the relocaton phase". --- Added MarkWithType events for marking due to dependent handles, newly promoted due to dead finalizable objects and mark_steal. Perfview needs to be updated to work with this otherwise you can't see the GCStats view (I'm submitting a PR for that). Recorded time for marking roots (but sizedref is separate), short weak, ScanForFinalization, long weak, relocate, compact and sweep. Added a new version that includes the size of the object that triggered the event. This is for a request from #49424 (comment). Provided a new rundown GCSettings event that has info on settings hard to get from traces. Added a GCLOHCompact event which is fired for all heaps (heaps that didn't actually do LOH compact would have values of all 0s). I'm trying to add events that don't require a lot of correlation with other events to interpret. This is to help get an idea how long it takes to compact LOH and how reference rich it is. Added a verbose level GCFitBucketInfo event which helps us with FL fitting investigation. I'm firing this for 2 things in a gen1 GC: 1) for plugs that allocated with allocate_in_condemned_generations the event will capture all of them with the same bucketing as we do for gen2 FL; 2) for gen2 FL we look at the largest free items that take up 25% of the FL space, or if there are too many of them we stop after walking a certain number of free items as we have to limit the amount of time we are spending here. --- Fixed issues - For BGC we were reporting the pinned object count the same as the last FGC..and that causes confusion so fixed that. Fixed #45375 While fixing #45375, I noticed we have another bug related to alloc tick which is we are not firing the alloc tick events correctly for LOH and POH since the ETW alloc tracking didn't seperate them... fixed this too. Added the POH type for GCSegmentTypeMap which was missing in the manifest. --- Did some cleanup in eventtrace.h - we don't need the info that's not used which means we just ended up duplicating things like _GC_ROOT_KIND in more places than needed. --- Note, I realize that I do have some inconsistency with the FEAETURE_EVENT_TRACE here, as in, some code should be under an #ifdef check but is not. I will look into a remedy for that with a separate PR.

mairaw transferred this issue from dotnet/core Nov 30, 2020

Dotnet-GitSync-Bot added area-System.Threading untriaged New issue has not been triaged by the area owner labels Nov 30, 2020

stephentoub added area-GC-coreclr and removed area-System.Threading labels Nov 30, 2020

mangod9 added area-System.Diagnostics.Tracing and removed area-GC-coreclr labels Nov 30, 2020

mangod9 added area-GC-coreclr and removed area-System.Diagnostics.Tracing labels Dec 1, 2020

mangod9 removed the untriaged New issue has not been triaged by the area owner label Dec 1, 2020

mangod9 added this to the 6.0.0 milestone Dec 1, 2020

Maoni0 self-assigned this Jul 12, 2021

Maoni0 mentioned this issue Jul 18, 2021

instrumentation changes #55888

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 18, 2021

Maoni0 closed this as completed in #55888 Jul 22, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 22, 2021

ghost locked as resolved and limited conversation to collaborators Aug 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCAllocationTick_V2 ETW events are reporting wrong allocation sizes in multithreaded application #45375

GCAllocationTick_V2 ETW events are reporting wrong allocation sizes in multithreaded application #45375

Temp1ar commented Nov 30, 2020

ghost commented Nov 30, 2020

ghost commented Nov 30, 2020

mangod9 commented Nov 30, 2020

Temp1ar commented Dec 1, 2020

Maoni0 commented Dec 1, 2020

ghost commented Dec 1, 2020

mangod9 commented Dec 1, 2020

Maoni0 commented Jul 12, 2021

GCAllocationTick_V2 ETW events are reporting wrong allocation sizes in multithreaded application #45375

GCAllocationTick_V2 ETW events are reporting wrong allocation sizes in multithreaded application #45375

Comments

Temp1ar commented Nov 30, 2020

ghost commented Nov 30, 2020

ghost commented Nov 30, 2020

mangod9 commented Nov 30, 2020

Temp1ar commented Dec 1, 2020

Maoni0 commented Dec 1, 2020

ghost commented Dec 1, 2020

mangod9 commented Dec 1, 2020

Maoni0 commented Jul 12, 2021