-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MEX SRC requires large amount of memory in qnet (possibly other apps) #5063
Comments
This is expected with Bullet ray tracing engine. Each image loads its own copy of the DSK into memory. I'm surprised you didn't encounter this issue with the HRSC data. I implemented a shared DSK model with Embree, but not Bullet. @KrisBecker wrote one for Bullet and we're hoping to have that integrated into the code base this year. |
The SRC images have bullet encoded on their labels and the ShapeModel group has been commented out in my preference file since it is creating errors for other projects (#5062). The HRSC on the other hand were originally processed using the default ray tracing (so no keyword on the label) since I didn't how to invoke bullet or any other ray tracing engine, and since that time, up until recently, bullet specific uses have been via my preference file. HRSC used a minimal amount of memory for qnet and other apps. Is there any reason why bullet on the label vs a call via the preference file would matter? I can no longer use the isis ray tracing engine under isis7.1.0 for HRSC (#5036), so it has to be bullet (or I guess embree - do I want to use that instead??), so I might need to reprocess those images to get bullet on the label to see if that matters for that data set or not. Though I don't understand why it would. This still seems like an SRC-specific problem. |
Right now when you load a camera model for an image with the Bullet ray tracing engine, it will read the DSK file into memory and generate an in-memory copy of the file for use with the Bullet library. This means every single image/camera has its own copy of the DSK which is effectively a duplicate of the others. If you are using the Embree ray tracing engine, before it loads a copy of the DSK into memory, it checks to see if anything else has loaded it into memory and if it has it will re-use that. So, with the Embree ray tracing engine, all of the different images/cameras share the in memory copy of the DSK. Kris wrote some code that does effectively the same thing for the Bullet ray tracing model and we're hoping to have a PR with those changes come in this year.
This shouldn't matter but I wouldn't be surprised if there is a bug and the interactive programs aren't properly respecting your user IsisPreferences file.
If you are pushing forward with using the DSK for control processing, then you will need to use Embree for right now because of this memory issue. Embree only provides floating point (32-bit) precision intersections, though. Bullet provides double (64-bit) precision intersections so you will want to use Bullet for final processing after we get the shared Bullet model implemented. All of this memory stuff is why I have been saying we can't do control processing with DSKs until we get the code Kris wrote incorporated into dev. |
Except there were no issues for HRSC. But as you pointed out there could be some bug with programs and how ray tracing is being specified. I'll see if anything jumps out there. Waiting on external contributions seems like a shortfall to me and not something we should have to rely on, though I understand why we would't want to just address things ourselves knowing that something might be soon to come. Hard call really. I hadn't heard you arguments in regard to waiting on external code until now (or they hadn't sunk in/made sense at the time). I've tried to work with a DEM for HRSC and it was sort of a disaster, where as the DSK and bullet was a good combination. I'll take a look at embree as well and see what that behavior is like. My initial tests with HRSC last year used the ellipsoid and worked out ok (for about 45 images though), so that is always a fall back, but having a shape model is always better. I suspect by the time we get any bullet contribution I'll be done with generating networks (HRSC is done already, working on SRC now, then need to connect the 2 with any remaining time; working on this now due to other projects impacted by other issues...). Things I'll try and report on:
|
It is particularly tough with the shared Bullet models because it is incorporated into other contributions that Kris wants to submit back to us for general public release. So, if we implement it ourselves it will create a bunch more conflicts on his end that he will need to resolve. I'm much more optimistic about getting the external contributions incorporated sooner rather than later because Kris has funding to do it under the O-REx extended mission. Until now, he's been doing things under extra hours and a few small PDARTs. |
Well, embree throws errors. This is a new bug based on HRSC tests from June when there were no errors. I will get a post in for that. I have determined there is no difference in how qnet behaves when bullet is on the image label versus being specified from a preference file. Additionally, I have confirmed that qnet memory consumption is indeed happening for HRSC as well as SRC, as you thought @jessemapel. I managed to avoid it last year because I processed the data without specifying bullet anywhere (because I didn't know I needed to) and when I did start to understand things and access bullet for some applications, it was always via the preference file. Since qnet can't take a preference file on the command line, that application used the isis ray tracing instead which I have confirmed does not consume any extra memory. Since I have generally only had good results using bullet while processing HRSC and now SRC, I will continue to use it. Other than the large memory consumption for qnet, the only other application I noticed that was using a lot of memory was cnetcheck with NOLATLON=true which is necessary during a first pass for near limb points. The findfeatures workflow (isisminer overlap, findfeatures, cnetcombinept), cnetstats, cnetedit, etc. have not been negatively impacted by the bullet ray tracing behavior probably because they don't open thousands of images at once or at all. Bullet is essential for SRC since footprints and phocube shapes are bad/nonsensical via isis ray tracing. I will use the cluster for high memory applications and work around the other problems for the time being. @jessemapel - should we keep this post open until changes are made to how bullet operates? I think we should since it will of course be a good test of everything, but let me know if you think otherwise. As usual, thanks for helping me better understand how some of these new methods work. |
Yeah I think we should leave this open as I don't think there's any other documentation on the repo for that. I appreciate you being the first one of our internal users really pushing the ray tracing stuff. It's frustrating encountering all these errors, but we have to know the problems before we can fix them. |
Thank you for your contribution! Unfortunately, this issue hasn't received much attention lately, so it is labeled as 'stale.' If no additional action is taken, this issue will be automatically closed in 180 days. |
@lwellerastro and @jessemapel agreed that this issue should remain open until changes to bullet are made (or at least better documentation). |
Thank you for your contribution! Unfortunately, this issue hasn't received much attention lately, so it is labeled as 'stale.' If no additional action is taken, this issue will be automatically closed in 180 days. |
I don't think this post should be considered a priority - we understand the problem and there may be future fixes/contributions from an outside source, though there is no timeline or true commitment that I am aware of. It is not blocking any immediate work, but it's still an issue and should probably stay open until addressed. |
Thank you for your contribution! Unfortunately, this issue hasn't received much attention lately, so it is labeled as 'stale.' If no additional action is taken, this issue will be automatically closed in 180 days. If you want to participate in our support prioritization meetings or be notified when support sprints are happening, you can sign up the support sprint notification emails here. Read more about our support processs here |
ISIS version(s) affected: 7.1.0
Description
When attempting to load about 2500 MEX SRC Phobos images in qnet with a small network (or none), the program proceeds to use an excessive amount of memory forcing me to kill the app before using all available memory. I have seen it get to in excess of 60 G and the vm's I have access to typically have about 50 G+ available, or near 100 G in one case. It also takes >>1 hour to load (when I ran on a cluster node interactively), but I think that might have to do with accessing the data area via an externally mounted location, which is a different problem. And on that note, SRC still consumes an excessive amount of memory even while pointing to an old, locally mounted data are.
I have confirmed that >>5000 Kaguya TC images use about 4 G when loading into qnet, and >1000 MEX HRSC images uses about 2 G. This is along with loading very large networks. SRC consumes tons of memory regardless of the size or inclusion of an network.
How to reproduce
launch qnet and load a list of src images (no network required)
I have placed my full SRC image list and a small network in Isis3Tests/MEX_SRC/Qnet/ in my user work area. All images reside on our /scratch disk.
In order to work with these data a user must be in isis7.1.0 as the camera model is new and does not reside elsewhere. These images have been initialized with a DSK shape model using the bullet ray tracing engine (except for a handful due to failures, but isis ray tracing worked for those), but so has HRSC and that data have not had any issues like this.
Please let me know if having access to HRSC images and/or other datasets is useful.
I will report here if other applications seem to be using a large amount of memory to work with these images.
The text was updated successfully, but these errors were encountered: