Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCAllocationTick_V2 ETW events are reporting wrong allocation sizes in multithreaded application #45375

Closed
Temp1ar opened this issue Nov 30, 2020 · 8 comments · Fixed by #55888
Assignees
Milestone

Comments

@Temp1ar
Copy link

Temp1ar commented Nov 30, 2020

Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:

using System.Linq;
using System.Threading;
class Program
{
  public static void Main(string[] args)
  {
    var threadsCount = args.Any() ? int.Parse(args[0]) : 4;
    var waitHandles = new WaitHandle[threadsCount];
    int counter = 1024;
    for (int i = 0; i != threadsCount; ++i)
    {
      var waitHandle = new AutoResetEvent(false);
      var thread = new Thread(() =>
      {
        while (Interlocked.Decrement(ref counter) > 0)
        {
          var _ = new byte[200 * 1024];
        }
        waitHandle.Set();
      });
      thread.Start();
      waitHandles[i] = waitHandle;
    }
    WaitHandle.WaitAll(waitHandles);
  }
}

If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:

image

However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:

image

Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?

@mairaw mairaw transferred this issue from dotnet/core Nov 30, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Threading untriaged New issue has not been triaged by the area owner labels Nov 30, 2020
@ghost
Copy link

ghost commented Nov 30, 2020

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:

using System.Linq;
using System.Threading;
class Program
{
  public static void Main(string[] args)
  {
    var threadsCount = args.Any() ? int.Parse(args[0]) : 4;
    var waitHandles = new WaitHandle[threadsCount];
    int counter = 1024;
    for (int i = 0; i != threadsCount; ++i)
    {
      var waitHandle = new AutoResetEvent(false);
      var thread = new Thread(() =>
      {
        while (Interlocked.Decrement(ref counter) > 0)
        {
          var _ = new byte[200 * 1024];
        }
        waitHandle.Set();
      });
      thread.Start();
      waitHandles[i] = waitHandle;
    }
    WaitHandle.WaitAll(waitHandles);
  }
}

If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:

image

However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:

image

Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?

Author: Temp1ar
Assignees: -
Labels:

area-GC-coreclr, untriaged

Milestone: -

@ghost
Copy link

ghost commented Nov 30, 2020

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:

using System.Linq;
using System.Threading;
class Program
{
  public static void Main(string[] args)
  {
    var threadsCount = args.Any() ? int.Parse(args[0]) : 4;
    var waitHandles = new WaitHandle[threadsCount];
    int counter = 1024;
    for (int i = 0; i != threadsCount; ++i)
    {
      var waitHandle = new AutoResetEvent(false);
      var thread = new Thread(() =>
      {
        while (Interlocked.Decrement(ref counter) > 0)
        {
          var _ = new byte[200 * 1024];
        }
        waitHandle.Set();
      });
      thread.Start();
      waitHandles[i] = waitHandle;
    }
    WaitHandle.WaitAll(waitHandles);
  }
}

If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:

image

However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:

image

Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?

Author: Temp1ar
Assignees: -
Labels:

area-System.Diagnostics.Tracing, untriaged

Milestone: -

@mangod9
Copy link
Member

mangod9 commented Nov 30, 2020

I believe the events might be lossy at higher scale, but adding Tracing folks to confirm

@Temp1ar
Copy link
Author

Temp1ar commented Dec 1, 2020

I believe the events might be lossy at higher scale, but adding Tracing folks to confirm

Actually the number of events are the same: 1023 and 1024. It's AllocationAmount field value is incorrect for some of them.

@Maoni0
Copy link
Member

Maoni0 commented Dec 1, 2020

you have found a bug! the update of etw_allocation_running_amount is not under a lock so that's why it became inaccurate when you ran it multi-threaded. this has been this way for a long time but we don't have diagnostic tests that verify that the info we fire in ETW events is accurate. I think people who use this in general don't care a lot about the absolute accuracy - as long as it's good enough it helps them find problem they are fine with it. but this is a bug. etw_allocation_running_amount is declared as a PER_HEAP var but by the time we updated it (with an ordinary +) we already gave up the per heap lock.

we should fire this while we are still holding the lock.

@ghost
Copy link

ghost commented Dec 1, 2020

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:

using System.Linq;
using System.Threading;
class Program
{
  public static void Main(string[] args)
  {
    var threadsCount = args.Any() ? int.Parse(args[0]) : 4;
    var waitHandles = new WaitHandle[threadsCount];
    int counter = 1024;
    for (int i = 0; i != threadsCount; ++i)
    {
      var waitHandle = new AutoResetEvent(false);
      var thread = new Thread(() =>
      {
        while (Interlocked.Decrement(ref counter) > 0)
        {
          var _ = new byte[200 * 1024];
        }
        waitHandle.Set();
      });
      thread.Start();
      waitHandles[i] = waitHandle;
    }
    WaitHandle.WaitAll(waitHandles);
  }
}

If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:

image

However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:

image

Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?

Author: Temp1ar
Assignees: -
Labels:

area-GC-coreclr, untriaged

Milestone: -

@mangod9
Copy link
Member

mangod9 commented Dec 1, 2020

Thanks @Temp1ar and @Maoni0. I have moved back to the gc area.

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Dec 1, 2020
@mangod9 mangod9 added this to the 6.0.0 milestone Dec 1, 2020
@Maoni0
Copy link
Member

Maoni0 commented Jul 12, 2021

I'll include this with my instrumentation changes.

@Maoni0 Maoni0 self-assigned this Jul 12, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 18, 2021
Maoni0 added a commit that referenced this issue Jul 22, 2021
We've never had ETW events that had fields about time - we rely on the the timestamps of the ETW events themselves to calculate time. This checkin introduces some new events/event fields that will include time info instead of firing individual events that otherwise don't carry much info, ie, they'd just be there so we could use their timestamps, which would be a nuisance when we have many heaps.

The rationale behind firing events with time already calculated is 

1) it does reduce overhead since we don't fire as many events so we can fire these in fewer events for informational level
2) firing individual vents and having the tools interpret them isn't very useful unlike events such as GCStart/GCEnd which can be correlated with other events (eg, you get GCStart, and then get a bunch of other events so you know those happened after a GC started) wheras things like very GC internal things don't have this property, ie, we are not gonna care that "these other events happened during a GC and specifically during the relocaton phase". 

---

Added MarkWithType events for marking due to dependent handles, newly promoted due to dead finalizable objects and mark_steal. Perfview needs to be updated to work with this otherwise you can't see the GCStats view (I'm submitting a PR for that).

Recorded time for marking roots (but sizedref is separate), short weak, ScanForFinalization, long weak, relocate, compact and sweep.

Added a new version that includes the size of the object that triggered the event. This is for a request from #49424 (comment). 

Provided a new rundown GCSettings event that has info on settings hard to get from traces. 

Added a GCLOHCompact event which is fired for all heaps (heaps that didn't actually do LOH compact would have values of all 0s). I'm trying to add events that don't require a lot of correlation with other events to interpret. This is to help get an idea how long it takes to compact LOH and how reference rich it is.

Added a verbose level GCFitBucketInfo event which helps us with FL fitting investigation. I'm firing this for 2 things in a gen1 GC:

1) for plugs that allocated with allocate_in_condemned_generations the event will capture all of them with the same bucketing as we do for gen2 FL;

2) for gen2 FL we look at the largest free items that take up 25% of the FL space, or if there are too many of them we stop after walking a certain number of free items as we have to limit the amount of time we are spending here. 

---

Fixed issues -

For BGC we were reporting the pinned object count the same as the last FGC..and that causes confusion so fixed that.

Fixed #45375

While fixing #45375, I noticed we have another bug related to alloc tick which is we are not firing the alloc tick events correctly for LOH and POH since the ETW alloc tracking didn't seperate them... fixed this too.

Added the POH type for GCSegmentTypeMap which was missing in the manifest.

---

Did some cleanup in eventtrace.h - we don't need the info that's not used which means we just ended up duplicating things like _GC_ROOT_KIND in more places than needed. 

---

Note, I realize that I do have some inconsistency with the FEAETURE_EVENT_TRACE here, as in, some code should be under an #ifdef check but is not. I will look into a remedy for that with a separate PR.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 22, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants