Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with VCS check plot memory and time issues #10

Closed
durack1 opened this issue Aug 23, 2017 · 23 comments
Closed

Deal with VCS check plot memory and time issues #10

durack1 opened this issue Aug 23, 2017 · 23 comments
Milestone

Comments

@durack1
Copy link
Contributor

durack1 commented Aug 23, 2017

The issues described in CDAT/cdat#1424 will be investigated within tweaks generated to solve the reported problems following VCS changes documented in CDAT/vcs#236

@doutriaux1
Copy link
Contributor

@durack1 @lee1043 please post your codes here. I had @lee1043 leak nothing before.

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1, it's probably better to keep the issue (posted code, discussion) back in the vcs repo, no? If so which issue #?

@doutriaux1
Copy link
Contributor

Let's start here because I suspect it's user error at this point.

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 the two variants currently running (as it takes soooooooo long to plot these things, they're still running 12hrs later..) are:
PCMDI/amipbcs/make_newVsOldDiffs.py
and
PCMDI/amipbcs/make_newVsOldDiffs-delFudge.py

Both reference data on ocean

@doutriaux1
Copy link
Contributor

@durack1 you had a template object created that you were not clearing, once this is fixed it seems stable. Will test again shortly when I get onsite because remotely it is painful. Might try wirth a mesa version as well just to see.

(nightly) doutriaux1@oceanonly:[Paul]:[1830]> python test_leak.py 
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightly
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 81.494 secs; Max mem: 1.497 GB PyObj#: 0113248;
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
00002 processing: 1870-02 sic     Time: 93.325 secs; Max mem: 1.911 GB PyObj#: 0113249;
00003 processing: 1870-03 sic     Time: 78.404 secs; Max mem: 1.911 GB PyObj#: 0113005;
00004 processing: 1870-04 sic     Time: 78.045 secs; Max mem: 1.911 GB PyObj#: 0113006;
00005 processing: 1870-05 sic     Time: 76.214 secs; Max mem: 1.911 GB PyObj#: 0113007;

@doutriaux1
Copy link
Contributor

@durack1 ok switching to mesalib I get ridiculous speed gain... I do see the memory increasing, I will investigate further

(nightly) doutriaux1@oceanonly:[Paul]:[1831]> python test_leak.py 
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightly
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 05.058 secs; Max mem: 1.535 GB PyObj#: 0113186;
00002 processing: 1870-02 sic     Time: 05.195 secs; Max mem: 2.023 GB PyObj#: 0113202;
00003 processing: 1870-03 sic     Time: 04.915 secs; Max mem: 2.026 GB PyObj#: 0113112;
00004 processing: 1870-04 sic     Time: 04.977 secs; Max mem: 2.073 GB PyObj#: 0113133;
00005 processing: 1870-05 sic     Time: 05.094 secs; Max mem: 2.122 GB PyObj#: 0113114;
00006 processing: 1870-06 sic     Time: 05.198 secs; Max mem: 2.174 GB PyObj#: 0113255;
00007 processing: 1870-07 sic     Time: 04.765 secs; Max mem: 2.222 GB PyObj#: 0113196;
00008 processing: 1870-08 sic     Time: 04.840 secs; Max mem: 2.273 GB PyObj#: 0113117;
00009 processing: 1870-09 sic     Time: 04.438 secs; Max mem: 2.318 GB PyObj#: 0113198;

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 what's the object name, and the command to delete it? I'll add it in and rerun my script/log..

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 I think that a trimmed down version of this script, along with a single timeslice of input data should be added to the test suite.. It seems it's a very sensitive example of timings and memory usage when turning mesalib on and off.. And any further tweaks to EzTemplate

@doutriaux1
Copy link
Contributor

@durack1 yes once we figured out where the 50Mb per plot come from 😉 @danlipsa I suspect something in isofills?

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 would you like to revisit the "user error" #10 (comment) above?

@doutriaux1
Copy link
Contributor

@durack1 @danlipsa, yes it's something in the plotting of isofill/boxfill I will track it today or tomorrow.

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 time to move discussions back into the vcs repo? I think it'll be cleaner to generate a dedicated issue and move comments there

@doutriaux1
Copy link
Contributor

not fully revisiting the user erro, there was some left over templates which were creating additional object which you complained about it. So that was definitely a user error. There is only one textorientatation created per plot now. So that user-error part is nor solved. Need to figure out why contours create such a mem leak now though.

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

What was the object that required deletion? You didn't answer my #10 (comment) above

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 see CDAT/cdat#1424 (comment) it may in fact have already been identified by @chaosphere2112

The tool described in CDAT/cdat#1424 (comment) may also be really useful in determining the cause and finally sorting out this issue once and for all

@doutriaux1
Copy link
Contributor

@durack1 rightfully pointe that using X we have no more leak!

(nightlyX) doutriaux1@oceanonly:[Paul]:[1896]> python test_leak.py 
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightlyX
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 02.280 secs; Max mem: 1.543 GB PyObj#: 0113551;
00002 processing: 1870-02 sic     Time: 02.083 secs; Max mem: 2.008 GB PyObj#: 0113254;
00003 processing: 1870-03 sic     Time: 01.329 secs; Max mem: 2.038 GB PyObj#: 0113366;
00004 processing: 1870-04 sic     Time: 01.057 secs; Max mem: 2.038 GB PyObj#: 0113437;
00005 processing: 1870-05 sic     Time: 01.046 secs; Max mem: 2.038 GB PyObj#: 0113327;
00006 processing: 1870-06 sic     Time: 01.013 secs; Max mem: 2.038 GB PyObj#: 0113328;
00007 processing: 1870-07 sic     Time: 00.985 secs; Max mem: 2.038 GB PyObj#: 0113329;
00008 processing: 1870-08 sic     Time: 01.034 secs; Max mem: 2.038 GB PyObj#: 0113330;
00009 processing: 1870-09 sic     Time: 00.990 secs; Max mem: 2.038 GB PyObj#: 0113269;
00010 processing: 1870-10 sic     Time: 00.975 secs; Max mem: 2.038 GB PyObj#: 0113270;
00011 processing: 1870-11 sic     Time: 01.035 secs; Max mem: 2.038 GB PyObj#: 0113271;
00012 processing: 1870-12 sic     Time: 01.260 secs; Max mem: 2.038 GB PyObj#: 0113272;
00013 processing: 1871-01 sic     Time: 01.047 secs; Max mem: 2.038 GB PyObj#: 0113221;
00014 processing: 1871-02 sic     Time: 01.051 secs; Max mem: 2.038 GB PyObj#: 0113111;
00015 processing: 1871-03 sic     Time: 01.010 secs; Max mem: 2.038 GB PyObj#: 0113112;
00016 processing: 1871-04 sic     Time: 00.969 secs; Max mem: 2.038 GB PyObj#: 0113113;
00017 processing: 1871-05 sic     Time: 00.976 secs; Max mem: 2.038 GB PyObj#: 0113225;
00018 processing: 1871-06 sic     Time: 01.044 secs; Max mem: 2.038 GB PyObj#: 0113115;
00019 processing: 1871-07 sic     Time: 01.200 secs; Max mem: 2.038 GB PyObj#: 0113116;
00020 processing: 1871-08 sic     Time: 01.053 secs; Max mem: 2.038 GB PyObj#: 0113117;
00021 processing: 1871-09 sic     Time: 01.556 secs; Max mem: 2.038 GB PyObj#: 0113118;
00022 processing: 1871-10 sic     Time: 00.971 secs; Max mem: 2.038 GB PyObj#: 0113119;

But is I switched to the mesa-based one I get:

(nightly) doutriaux1@oceanonly:[Paul]:[1897]> python test_leak.py 
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightly
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 01.813 secs; Max mem: 1.494 GB PyObj#: 0113365;
00002 processing: 1870-02 sic     Time: 01.426 secs; Max mem: 1.961 GB PyObj#: 0113336;
00003 processing: 1870-03 sic     Time: 01.177 secs; Max mem: 1.961 GB PyObj#: 0113407;
00004 processing: 1870-04 sic     Time: 01.169 secs; Max mem: 1.974 GB PyObj#: 0113297;
00005 processing: 1870-05 sic     Time: 01.180 secs; Max mem: 1.987 GB PyObj#: 0113298;
00006 processing: 1870-06 sic     Time: 01.175 secs; Max mem: 2.001 GB PyObj#: 0113299;
00007 processing: 1870-07 sic     Time: 01.218 secs; Max mem: 2.014 GB PyObj#: 0113300;
00008 processing: 1870-08 sic     Time: 01.123 secs; Max mem: 2.027 GB PyObj#: 0113301;
00009 processing: 1870-09 sic     Time: 01.394 secs; Max mem: 2.040 GB PyObj#: 0113241;
00010 processing: 1870-10 sic     Time: 01.109 secs; Max mem: 2.053 GB PyObj#: 0113242;
00011 processing: 1870-11 sic     Time: 01.127 secs; Max mem: 2.066 GB PyObj#: 0113243;
00012 processing: 1870-12 sic     Time: 01.093 secs; Max mem: 2.079 GB PyObj#: 0113244;

It is EXACTLY the same python underneath. @danlipsa I think it's either a mesalib or a VTK-built-against-mesalib issue. Your thoughts?

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 @danlipsa and to fill in the holes, the reason why I can't use X11 is because with the VNC session (and bg=1) I get all black/blank pngs..

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 it's also interesting to note that in your example above it's not just a single object all the time, so the transition:

00001 processing: 1870-01 sic     Time: 02.280 secs; Max mem: 1.543 GB PyObj#: 0113551;

00002 processing: 1870-02 sic     Time: 02.083 secs; Max mem: 2.008 GB PyObj#: 0113254;

00003 processing: 1870-03 sic     Time: 01.329 secs; Max mem: 2.038 GB PyObj#: 0113366;

00004 processing: 1870-04 sic     Time: 01.057 secs; Max mem: 2.038 GB PyObj#: 0113437;

00005 processing: 1870-05 sic     Time: 01.046 secs; Max mem: 2.038 GB PyObj#: 0113327;
00006 processing: 1870-06 sic     Time: 01.013 secs; Max mem: 2.038 GB PyObj#: 0113328;
00007 processing: 1870-07 sic     Time: 00.985 secs; Max mem: 2.038 GB PyObj#: 0113329;
00008 processing: 1870-08 sic     Time: 01.034 secs; Max mem: 2.038 GB PyObj#: 0113330;

00009 processing: 1870-09 sic     Time: 00.990 secs; Max mem: 2.038 GB PyObj#: 0113269;
00010 processing: 1870-10 sic     Time: 00.975 secs; Max mem: 2.038 GB PyObj#: 0113270;
00011 processing: 1870-11 sic     Time: 01.035 secs; Max mem: 2.038 GB PyObj#: 0113271;
00012 processing: 1870-12 sic     Time: 01.260 secs; Max mem: 2.038 GB PyObj#: 0113272;

00013 processing: 1871-01 sic     Time: 01.047 secs; Max mem: 2.038 GB PyObj#: 0113221;

00014 processing: 1871-02 sic     Time: 01.051 secs; Max mem: 2.038 GB PyObj#: 0113111;
00015 processing: 1871-03 sic     Time: 01.010 secs; Max mem: 2.038 GB PyObj#: 0113112;
00016 processing: 1871-04 sic     Time: 00.969 secs; Max mem: 2.038 GB PyObj#: 0113113;

00017 processing: 1871-05 sic     Time: 00.976 secs; Max mem: 2.038 GB PyObj#: 0113225;

00018 processing: 1871-06 sic     Time: 01.044 secs; Max mem: 2.038 GB PyObj#: 0113115;
00019 processing: 1871-07 sic     Time: 01.200 secs; Max mem: 2.038 GB PyObj#: 0113116;
00020 processing: 1871-08 sic     Time: 01.053 secs; Max mem: 2.038 GB PyObj#: 0113117;
00021 processing: 1871-09 sic     Time: 01.556 secs; Max mem: 2.038 GB PyObj#: 0113118;
00022 processing: 1871-10 sic     Time: 00.971 secs; Max mem: 2.038 GB PyObj#: 0113119;

Some of these have a single additional python object created (the number behind the PyObj#: entry), whereas other times the numbers are much larger than 1

@doutriaux1
Copy link
Contributor

@durack1 some also have a great number of py obj going down. Probably gc related.

To answer your question you had forgotten to remive the tmpl created in initVCS tmpl = vcs.createtemplate()

@doutriaux1
Copy link
Contributor

@durack1 when plotting s1s instead of s1 and using s1s for the diff instead of s1, which is what you actually meant we get a smaller memory print and the bug takes longer to appear:

UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightly
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 02.521 secs; Max mem: 1.039 GB PyObj#: 0113442;
00002 processing: 1870-02 sic     Time: 02.421 secs; Max mem: 1.039 GB PyObj#: 0113443;
00003 processing: 1870-03 sic     Time: 02.236 secs; Max mem: 1.039 GB PyObj#: 0113444;
00004 processing: 1870-04 sic     Time: 02.225 secs; Max mem: 1.039 GB PyObj#: 0113445;
00005 processing: 1870-05 sic     Time: 02.227 secs; Max mem: 1.039 GB PyObj#: 0113425;
00006 processing: 1870-06 sic     Time: 02.474 secs; Max mem: 1.039 GB PyObj#: 0113391;
00007 processing: 1870-07 sic     Time: 02.645 secs; Max mem: 1.039 GB PyObj#: 0113392;
00008 processing: 1870-08 sic     Time: 02.649 secs; Max mem: 1.039 GB PyObj#: 0113393;
00009 processing: 1870-09 sic     Time: 02.634 secs; Max mem: 1.053 GB PyObj#: 0113394;
00010 processing: 1870-10 sic     Time: 02.603 secs; Max mem: 1.097 GB PyObj#: 0113395;
00011 processing: 1870-11 sic     Time: 02.626 secs; Max mem: 1.147 GB PyObj#: 0113396;
00012 processing: 1870-12 sic     Time: 02.627 secs; Max mem: 1.198 GB PyObj#: 0113397;
00013 processing: 1871-01 sic     Time: 02.645 secs; Max mem: 1.249 GB PyObj#: 0113233;
00014 processing: 1871-02 sic     Time: 02.655 secs; Max mem: 1.291 GB PyObj#: 0113234;
00015 processing: 1871-03 sic     Time: 02.662 secs; Max mem: 1.343 GB PyObj#: 0113235;

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 thanks.. This script is just a dummy/demo for me at the moment, so I don't really care about the png output.. Just was hoping to highlight the timing and memory issues in a repeatable way so that you and @danlipsa could dig in and figure out the persistent solution

I think your comments on the gc are right, as you can see every 5 iterations or so the total object count drops 20 (4->5) or more, 164 (12->13) above

@durack1
Copy link
Contributor Author

durack1 commented Aug 23, 2017

@doutriaux1 the huge virtual memory footprint is also something that's concerning..

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 66302 durack1   20   0 12.3g 1.4g  72m R 99.9  0.6   1415:55 python make_newVsOldDiffs.py

12.3Gb when the script is generating a bunch of ~100Kb pngs is kinda concerning..

@doutriaux1
Copy link
Contributor

@durack1 I will send you the memory leak fixed script tomorrow. But yes with mesalib you get the memory leak nothing I can do about it right now. I will stop by tomorrow and try to get your png to be non black.

durack1 added a commit that referenced this issue Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants