Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue (crash) #3733

Closed
2 tasks done
Wyrade opened this issue Nov 6, 2021 · 10 comments · Fixed by #4530
Closed
2 tasks done

Memory issue (crash) #3733

Wyrade opened this issue Nov 6, 2021 · 10 comments · Fixed by #4530
Labels
bug Something isn't working crash Causes PoB to crash and is High Priority

Comments

@Wyrade
Copy link

Wyrade commented Nov 6, 2021

Check version

  • I'm running the latest version of Path of Building and I've verified this by checking the changelog

Check for duplicates

  • I've checked for duplicate issues by using the search function of the issue tracker

What is the expected behaviour/value?

PoB not crashing/running out of memory.

What is the actual behaviour/value?

PoB crashing/running out of memory.

How to reproduce the issue

I tried with two character builds, so probably not buildscpecific and 100% reproducable.

  1. Hover over one of the equipped items on the "Items" page in the list of equipped items, doesn't even need to be one with multiple euippable items, one is enough
  2. Start quickly switching items with the mouse scroll up and down (switching between the item and "None" is enough)(switch at least twice per second or more - less is probably enough, but the faster the switch, the faster the crash)
  3. The memory usage starts quickly going up, 40+ MB per tick in the task manager.
  4. At around 1700 MB memory used, PoB just shuts down, showing the window seen in the attachment for about 1-2 seconds. With fast enough switching I can make it 100% consistently crash in about 15 seconds with this.

(Found this accidentally while comparing stat changes switching two items.)
(PoB version 2.13.0, currently latest)

Build code

https://pastebin.com/YByttEAJ

Screenshots

PoB_crash

@Wyrade Wyrade added the bug Something isn't working label Nov 6, 2021
@zao
Copy link
Contributor

zao commented Nov 6, 2021

Reproduced it with these instructions on a build of mine. Impressive 🥇

@zao
Copy link
Contributor

zao commented Nov 6, 2021

Crashing under a debugger shows similar failure to most of the crash dumps in #3722:
0xC0000005: Access violation reading location 0xFFFFFFF1.

Stack trace suggests running out of memory on the Lua side:

>	SimpleGraphic.dll!@lj_err_throw@8�()	C
 	SimpleGraphic.dll!_lj_err_mem�()	C
 	SimpleGraphic.dll!_lj_BC_FUNCC�()	Unknown
 	00010662()	Unknown

@coani
Copy link

coani commented Nov 17, 2021

--- SCRIPT ERROR ---
Runtime error in 'C:\Users\x\AppData\Roaming\Path of Building Community\Launch.lua':
not enough memory

I've had these crashes a few times too

@CrowdControlled
Copy link

Similar crash, but script error says not enough memory.

https://imgur.com/OblK0Cr screen snip of the crash screen.

@ifnjeff
Copy link
Contributor

ifnjeff commented Jan 15, 2022

I have a suspicion this may be an ever-expanding undo snapshot stack, I'm going to take a stab at having the stack store diffs rather than snapshots, and putting a cap on the height of the stack

@coani
Copy link

coani commented Feb 23, 2022

I've had 2 random pob crashes lately, in one case while tinkering with the skill tree, the other time (just now) I had inventory open and had just alt-tabbed out for 2 secs (was comparing items on trade vs mine).
I had been switching between some sort orders for the uniques (sort by combined vs name), wasn't doing anything else there.

The event Viewer error is same for both crashes:

Faulting application name: Path of Building.exe, version: 0.0.0.0, time stamp: 0x606d1b4c
Faulting module name: SimpleGraphic.dll, version: 0.0.0.0, time stamp: 0x61ccf697
Exception code: 0xc0000005
Fault offset: 0x0007ba01
Faulting process id: 0x34f4
Faulting application start time: 0x01d828a9fead5556
Faulting application path: C:\Users\x\AppData\Roaming\Path of Building Community\Path of Building.exe
Faulting module path: C:\Users\x\AppData\Roaming\Path of Building Community\SimpleGraphic.dll
Report Id: 937bbf95-a3ee-42fb-bae8-dd6ce26bba61
Faulting package full name:
Faulting package-relative application ID:

@ImVexed
Copy link

ImVexed commented Mar 28, 2022

Is there any short-term solution for this until the underlying cause is fixed? I keep coming back to this issue every few days when it crashes causing me to lose an hour or two of work every time :(

Maybe save the unsaved work any time something has changed to a temp file similar to Notepad++? Until either, it's saved properly or gracefully quit and an unsaved work popup is shown?

@Fish013
Copy link
Contributor

Fish013 commented Apr 18, 2022

So, I've investigated this a bit further, and like @ifnjeff suspected - the undo list is growing too fast.
There actually exists a cap of 100 entries. However with a pretty empty build each undo snap is already about ~5mb, so for 100 snaps we're using 500mb of ram.
Even if it's pretty huge, it works if I put a garbage collect after every snapshot creation. The issue is that with scrolling we are producing the snapshots faster than the gc collects the dead elements.
I've experimentally set the snapshot limit down to 10 but with fast scrolling we allocate around 220mb additional ram where the pure size of the 10 snaps is "only" around 50mb.

So the options and respective disadvantages I see:

  • reduce undo limit from 100 to something pretty low (10-20)
    • reduces undo usefulness and would still allocate a lot of unnecessary memory (see last paragraph above)
  • gc after every (or every x) undo snapshot creations
    • makes scrolling through the item dropdown feel pretty clunky
  • implement differential snapshots
    • probably huge effort and maybe too computationally expensive (diffing ~5mb nested tables)
  • limiting how frequently subsequent snapshots can be created
    • undo might feel random to the user because it skipped a few actions
  • don't create snapshots for scrolling at all
    • next undo will additionally always revert selected item via scrolling
  • adjust garbage collector settings to be more aggressive (I've tested collectgarbage("setstepmul", 1000) to prevent crashes)
    • memory allocations still overshoot a lot - but might be a good idea regardless of final decision for this issue because we are allocating objects pretty fast sometimes

If anyone has any other ideas feel free to add them

@Wyrade
Copy link
Author

Wyrade commented Apr 18, 2022

@Fish013
I have several ideas, but I'm not really a programmer, so sorry if they are unfeasible or too much effort to do:

  • Every nth time you allocate significant memory, check the current total memory usage, and if it's too big (maybe even just too big difference compared to last such check or the one before that), freeze the UI with a message "cleaning memory", and force the memory cleaning ("garbage cllection"?) to unallocate unneeded memory.
    • Although this way it's not completely under the hood, it's infinitely more preferable than a random crash losing progress, and likely very easy to implement.
  • Have both a quick and slow undo list. The quick one can use the current method, limited to 10-20, and a slow one can work beyond that.
    For the slow one, my ideas:
    • Record each user action on their own, like a macro or replay, make it into an easily readable dropdown list (maybe with timestamps too) or similar so user can select to which point they want to go.
      Then if they want to go past the quick snapshot limit, replay all actions from the initial snapshot, which could be sped up by not displaying each replayed action's changes on the ui, just freezing the UI with a message "in progress", and not calculating anything unnecessary for the replay, like dps, hp and the like.
      After recording n actions, it would probably help this to make extra proper snapshots, so the replay doesn't need to start all the way from the initial one.
      (If this is equivalent to the differential snapshots you mentioned, sorry, in my mind it looks differently under the hood, but I might be wrong and for something like this differential snapshots is the way to go.)
    • At each user action, record its reverse action, the action it would take to change the state back to the previous state, then like the previous idea, play it like a macro/replay from the last proper snapshot, preferably with an easily readable list to select a point to go, so we can skip a lot of unnecessary UI drawing and calculations.
      I assume this is less trivial and/or more work than I'd think and than the previous option, but it sounds feasible to me.
    • Maybe a combination of normal snapshots and one of the previous action recording, depending on what kind of user action was taken.
  • After 10-20, create snapshots only for every nth action, so something like from 20-40 have only every second recorded, past that only every 5th, or similar, to still have some undo options but heavily limit its maximum.
    The main thing here I think is this should be indicated very clearly to the user in real time when they try to do an undo that would undo more than one step. Maybe a popup confirmation window that says "past this point you can only undo two actions at a time. proceed?", which window should be able to be disabled in the settings so it doesn't annoy people too much.

@ifnjeff
Copy link
Contributor

ifnjeff commented Apr 18, 2022

Apologies for not following up on this a while ago, I made some discoveries and attempts that might be useful, but I never arrived at a good final solution. @Fish013 is on the right track, I'll add the extra details I had found in case you or anyone else wants to see this through.

Regarding the issue: One reason for the large size of each snapshot is that some of the tables being copied have references to parts of the enormous constant data table, so we are storing the meaningful user-modified data alongside big chunks of static data. The undo handlers deep-copy their tables, so we're duplicating those huge data blocks.

I considered two solutions, both of which I got a bit stalled on because of unfamiliarity with lua, and how to get around it's odd table semantics.

  • Create a "reference" type to wrap all stored referenes into the data table
    • Table variables are all by-reference, so this would just be a superficial wrapper which gets special treatment in our deep-copy implementation
    • This would remove the data copies from our snapshots which should be a significant chunk, but I think you could still outpace the garbage collector because some information (like the passive tree) is still large and more challenging to untangle constant information from mutable user data
    • I suspect this might become a maintenance headache, as devs would need to remember to wrap all the stored references to data, could easily be forgotten
  • Implement undo states as diffs rather than snapshots
    • I attempted this by implementing a generic recursive diff for lua tables
    • For simple cases, it worked fine, but lua table hierarchies can by cyclic, so special care has to be given to track how self-references change (or don't)
    • Calculating the diff turns out to take a lot longer than just deep-copying, so scrolling through item lists felt sluggish (this could have been an issue with the cyclic tables though, who knows)
  • Implement snapshots by deep-copying only the elements that have changed, leaving the other elements as references
    • This was the implementation that I took the furthest, but eventually got distracted from, I've pushed my old WIP changes
    • Has a similar self-reference challenge to overcome
    • Seemingly had similar performance concerns, again may be possible to overcome
    • My next effort was going to be to add some kind of "constant" wrapper to this implementation
      • Similar to the "reference" wrapper above, but applied to every element of data, rather than to every stored reference into data
      • The goal would be for it to exist as a very thin/transparent wrapper around every element, so that references to it could see that it is constant, but still use it like normal (but being unable to modify it)
      • This could be applied to things outside of data as well, if there are any large constant pieces of information
      • I eventually got distracted by other things while trying to work this out, my lua wasn't strong enough, someone more experienced could give it go though

@Wyrade Wyrade mentioned this issue Jun 24, 2022
2 tasks
@Wires77 Wires77 added this to Crashes Jul 6, 2022
@QuickStick123 QuickStick123 added the crash Causes PoB to crash and is High Priority label Jul 7, 2022
@Wires77 Wires77 moved this to Done in Crashes Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crash Causes PoB to crash and is High Priority
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

8 participants