Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the testapi code to plot vectors #3528

Merged
merged 6 commits into from
Jun 28, 2020
Merged

Add the testapi code to plot vectors #3528

merged 6 commits into from
Jun 28, 2020

Conversation

seisman
Copy link
Member

@seisman seisman commented Jun 23, 2020

Description of proposed changes

Fixes #

Reminders

  • Correct base branch selected? master for new features, 6.1 for bug fixes
  • Make sure that your code follows our style. Use the other functions/files as a basis.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Describe changes to function behavior and arguments in a comment below the function declaration.
  • If adding new functionality, add a detailed description to the documentation and/or an example.

@PaulWessel
Copy link
Member

OK, this is a good test. What happens is that you tell GMT these are read-only data. Then it turns out that the operation you have selected (psxy lines with arrows) requires resampling. Having already read the data as read-only, now it would be up to the module to change that and allocate space (this is not happening, of course, and we reallocate and then try to free...). It seems inefficient to have to add such checks to all modules that may need to reallocate or resample the data, and it may be difficult for the i/o machinery to know what will be needed by the module calling it. Here are some possible solutions to this issue:

  1. Give up on accepting read-only (reference) memory and always duplicate.
  2. Flag which modules may, depending on options, require a reallocation or resampling of the data, and have the IO automatically switch GMT_IS_REFERENCE to GMT_IS_DUPLICATE.
  3. Add specific tests in modules (such as psxy) to detect this situation and then do a reallocation into gmt-controlled data for further use.
  • Option 1 seems wasteful. There are plenty of GMT modules where resampling etc is not an issue. Think input to blockmean, surface, gmtmath, many grid modules.
  • Option 2 tries to abstract away the problem so that the modules themselves dont have to take specific actions. I imagine that perhaps the THIS_MODULE_NEEDS setting could get another letter, say, *m for modify, meaning that the input to this module may need reallocation and hence should duplicate on input, and we would have to go through the modules to determine which one actually may modify data. So most modules are unchanged while psxy and others like it certainly gets a m. Then, if GMT_IS_REFERENCE is passed as in this example it automatically gets switched to GMT_IS_DUPLICATE. It has no effect on the user side apart from not clobbering their arrays. I think only a very small subset of modules needs an m flag.
  • Option 3 would still require us to make the determination in option 2, but now the action we must take has to be coded into the module workflow. I.e., after reading in the data we may have to add an if-test that duplicates the data and frees the original container when we no longer can be read-only. To me this seems much more involved.

Happy to hear other suggestions, @joa-quim and @seisman, but to me option 2 is cleaner. Yes, this probably means that there are times we duplicate when we did not have to, but we would be safe from running into such crashes.

@seisman
Copy link
Member Author

seisman commented Jun 23, 2020

I'm not sure if I explained the issue clearly. When GMT_IN is passed, the program crashes; when GMT_IN|GMT_IS_REFERENCE is passed, the program works (at least for me).

What happens is that you tell GMT these are read-only data.

As I understand it, GMT_IN means GMT can modify the data, and GMT_IN|GMT_IS_REFERENCE means the data are read-only to GMT. It makes sense for me, since the program works well with GMT_IN|GMT_IS_REFERENCE.

Then it turns out that the operation you have selected (psxy lines with arrows) requires resampling. Having already read the data as read-only, now it would be up to the module to change that and allocate space

Does it mean it should also crashes if GMT_IN|GMT_IS_REFERENCE is used?

@PaulWessel
Copy link
Member

OK when you pass GMT_IS_REFERENCE, GMT flags these so they are not freed. I will step through again, but yes I think if you just give GMT_IN you are saying "gmt may deal with these arrays are reallocate" but that is not true when coming from externals. For instance, arrays like x[3] = [1, 2 ,3} cannot be passed in without GMT_IS_REFERENCE since GMT cannot tell that this is a dynamic address we can reallocate or something static on the stack.

@PaulWessel
Copy link
Member

Looking at your case again. Yes, this is API user error since you are allocating a fixed array on the stack, then pass it to GMT as if it can be modified. As far as I know, when your x and y are passed into GMT, there is no way we can tell if those are dynamic or static arrays. So the onus is on you to use the correct flag.

@seisman
Copy link
Member Author

seisman commented Jun 23, 2020

Is it OK to say that, for external programs like PyGMT, we should always pass GMT_IN|GMT_IS_REFERENCE instead of GMT_IN?

@PaulWessel
Copy link
Member

I would like to say yes. I think that is always the case since we have no idea what we are being given.

@PaulWessel
Copy link
Member

OF course, most of the time there are no bad consequences since we just read from the array.

@PaulWessel
Copy link
Member

So my point still stands. If you want to pass a 1 Tb array from python into GMT, you would like not to have it duplicated unless it must be so. And thus is should not have to fall on you to remember which modules you should pass GMT_IS_REFERENCE and which ones you dont have to.

@joa-quim
Copy link
Member

I agree, and it sure is consensual, that:

  1. One duplicate only when one need to
  2. That decision should not fall on the external callers

But the issue is on how to achieve the only when one need to. From what I understood in your preferred solution 2. that would happen all the times for psxy (for example). Is that right?

@PaulWessel
Copy link
Member

Well, it would probably be good to determine the extent of the problem. It can affect lines and polygons (not points) in psxy[z] but depends on options (-A turns off resampling, for instance). THe question is how many modules are there where resampling can even happen. I think mostly in the plotters since the producers usually allocate output. Perhaps the problem is much smaller than I first thought and option 3 could work if it is just psxy[z].

seisman added a commit to GenericMappingTools/pygmt that referenced this pull request Jun 24, 2020
External programs like PyGMT can pass dataset/momory to GMT. By default,
GMT can read, modify and free the momery, which sometimes can cause
crashes.

Issue #406 reports an example in which PyGMT crashes. The issue was reported
to the upstream (see GenericMappingTools/gmt#3515
and GenericMappingTools/gmt#3528). It turns out
to be a API user error (i.e., a PyGMT bug).

As per the explanation of Paul, external programs like PyGMT should
always use `GMT_IN|GMT_IS_REFERENCE` to tell GMT that the data is
read-only, so that GMT won't try to change and free the memory.

This PR makes the change from `GMT_IN` to `GMT_IN|GMT_IS_REFERENCE`
in the `Session.open_virtual_file()` function, updates a few docstrings,
and also adds the script in #406 as a test.
seisman added a commit to GenericMappingTools/pygmt that referenced this pull request Jun 24, 2020
External programs like PyGMT can pass dataset/momory to GMT. By default,
GMT can read, modify and free the momery, which sometimes can cause
crashes.

Issue #406 reports an example in which PyGMT crashes. The issue was reported
to the upstream (see GenericMappingTools/gmt#3515
and GenericMappingTools/gmt#3528). It turns out
to be a API user error (i.e., a PyGMT bug).

As per the explanation of Paul, external programs like PyGMT should
always use `GMT_IN|GMT_IS_REFERENCE` to tell GMT that the data is
read-only, so that GMT won't try to change and free the memory.

This PR makes the change from `GMT_IN` to `GMT_IN|GMT_IS_REFERENCE`
in the `Session.open_virtual_file()` function, updates a few docstrings,
and also adds the script in #406 as a test.
@PaulWessel
Copy link
Member

After merging #3532 I believe you should add GMT_IS_REFERENCE to GMT_IN here and it should be fine. If not then I can check.

@seisman
Copy link
Member Author

seisman commented Jun 27, 2020

After merging #3532, passing GMT_IN works, but passing GMT_IN|GMT_IS_REFERENCE crashes.

@PaulWessel
Copy link
Member

OK, will look at this later. Dentist appointment...

@PaulWessel
Copy link
Member

The key problem with this and other psxy, psxyz, sphdstance calls from external environment is that they all may call gmt_fix_up_path which may reallocate the arrays it is given (this also happens in grdmask, psclip, pscoast as well but we already had solutions in place or no external data is used). Now, this problem was partially dealt with but new features (like line trimming used in the above test example) was added later and the solution for the problem (duplicate external segments if resampling is needed) happened too late in the workflow. I have gone through all places that call gmt_fix_up_path, including via gmt_geo_polygons, and made sure that if data are external and resampling will/may be happening then we duplicate the segment. The testapi_vector_plot works for either modes for me now. Please see you are able to use GMT_IN|GMT_IS_REFERENCE from pyGMT.

@PaulWessel
Copy link
Member

Hi @seisman how is this branch working for you now? I merged in the grdimage stuff.

@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

Yes, now it works with both GMT_IN and GMT_IN|GMT_IS_REFERENCE.

Copy link
Member

@PaulWessel PaulWessel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, forgot that I need to add review. Looks good!

@PaulWessel PaulWessel changed the title WIP: Add the testapi code to plot vectors Add the testapi code to plot vectors Jun 28, 2020
@PaulWessel
Copy link
Member

Do you want me to do the honor, @seisman ?

@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

I'm adding a test. Will merge it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants