-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCAllocationTick_V2 ETW events are reporting wrong allocation sizes in multithreaded application #45375
Comments
Tagging subscribers to this area: @dotnet/gc Issue DetailsAssume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:
If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers: However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right: Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?
|
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti Issue DetailsAssume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:
If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers: However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right: Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?
|
I believe the events might be lossy at higher scale, but adding Tracing folks to confirm |
Actually the number of events are the same: 1023 and 1024. It's AllocationAmount field value is incorrect for some of them. |
you have found a bug! the update of we should fire this while we are still holding the lock. |
Tagging subscribers to this area: @dotnet/gc Issue DetailsAssume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:
If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers: However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right: Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?
|
I'll include this with my instrumentation changes. |
We've never had ETW events that had fields about time - we rely on the the timestamps of the ETW events themselves to calculate time. This checkin introduces some new events/event fields that will include time info instead of firing individual events that otherwise don't carry much info, ie, they'd just be there so we could use their timestamps, which would be a nuisance when we have many heaps. The rationale behind firing events with time already calculated is 1) it does reduce overhead since we don't fire as many events so we can fire these in fewer events for informational level 2) firing individual vents and having the tools interpret them isn't very useful unlike events such as GCStart/GCEnd which can be correlated with other events (eg, you get GCStart, and then get a bunch of other events so you know those happened after a GC started) wheras things like very GC internal things don't have this property, ie, we are not gonna care that "these other events happened during a GC and specifically during the relocaton phase". --- Added MarkWithType events for marking due to dependent handles, newly promoted due to dead finalizable objects and mark_steal. Perfview needs to be updated to work with this otherwise you can't see the GCStats view (I'm submitting a PR for that). Recorded time for marking roots (but sizedref is separate), short weak, ScanForFinalization, long weak, relocate, compact and sweep. Added a new version that includes the size of the object that triggered the event. This is for a request from #49424 (comment). Provided a new rundown GCSettings event that has info on settings hard to get from traces. Added a GCLOHCompact event which is fired for all heaps (heaps that didn't actually do LOH compact would have values of all 0s). I'm trying to add events that don't require a lot of correlation with other events to interpret. This is to help get an idea how long it takes to compact LOH and how reference rich it is. Added a verbose level GCFitBucketInfo event which helps us with FL fitting investigation. I'm firing this for 2 things in a gen1 GC: 1) for plugs that allocated with allocate_in_condemned_generations the event will capture all of them with the same bucketing as we do for gen2 FL; 2) for gen2 FL we look at the largest free items that take up 25% of the FL space, or if there are too many of them we stop after walking a certain number of free items as we have to limit the amount of time we are spending here. --- Fixed issues - For BGC we were reporting the pinned object count the same as the last FGC..and that causes confusion so fixed that. Fixed #45375 While fixing #45375, I noticed we have another bug related to alloc tick which is we are not firing the alloc tick events correctly for LOH and POH since the ETW alloc tracking didn't seperate them... fixed this too. Added the POH type for GCSegmentTypeMap which was missing in the manifest. --- Did some cleanup in eventtrace.h - we don't need the info that's not used which means we just ended up duplicating things like _GC_ROOT_KIND in more places than needed. --- Note, I realize that I do have some inconsistency with the FEAETURE_EVENT_TRACE here, as in, some code should be under an #ifdef check but is not. I will look into a remedy for that with a separate PR.
Assume we have a program which is allocating fixed amount of memory(~200MB) in a tight loop with portions of 200KB in multiple threads:
If I run this program on .NET Core 5 (Windows 10) with one thread I'll get 1023 events with 204 848 bytes allocated each which is expected. Sum of allocation sizes column: ~210MB (see PerfView status bar in the bottom). If I ran multiple times - I get same numbers:
However If I run this program with 4 or 8 threads, allocation sizes will start to differ and total sum will not be equal to 200MB. It fluctuates between 70(Release configuration) to 360MB. Which can't be right:
Should there be thread synchronization on this allocation size counter or synchronization is not viable option because of low overhead requirement?
The text was updated successfully, but these errors were encountered: