Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running info on pandas.DataFrame with time column doesn't work #597

Closed
weiji14 opened this issue Sep 10, 2020 · 9 comments · Fixed by #1236
Closed

Running info on pandas.DataFrame with time column doesn't work #597

weiji14 opened this issue Sep 10, 2020 · 9 comments · Fixed by #1236
Assignees
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT
Milestone

Comments

@weiji14
Copy link
Member

weiji14 commented Sep 10, 2020

Description of the problem

Just noticed that datetime columns being passed into pygmt.info doesn't work. This follows on from the pandas.DataFrame inputs into pygmt.info functionality added in #574, see also #464 and #562 where the datetime machinery should be more or less implemented.

Full code that generated the error

import pygmt
import pandas as pd

table = pd.DataFrame(data=[1,3,2,5,4], columns=["z"])
table["time"] = pd.date_range(start="2020-01-01", periods=5)

pygmt.info(table=table)

Note that the equivalent gmt command does work on datetime inputs.

!gmt info temp.txt
temp.txt: N = 5	<1/5>	<2020-01-01T00:00:00/2020-01-05T00:00:00>

Full error message

---------------------------------------------------------------------------
GMTCLibError                              Traceback (most recent call last)
<ipython-input-6-a9e68c3dc07e> in <module>
----> 1 pygmt.info(table=table)

~/pygmt/pygmt/helpers/decorators.py in new_module(*args, **kwargs)
    235                 if alias in kwargs:
    236                     kwargs[arg] = kwargs.pop(alias)
--> 237             return module_func(*args, **kwargs)
    238 
    239         new_module.aliases = aliases

~/pygmt/pygmt/modules.py in info(table, **kwargs)
    116 
    117         with GMTTempFile() as tmpfile:
--> 118             with file_context as fname:
    119                 arg_str = " ".join(
    120                     [fname, build_arg_string(kwargs), "->" + tmpfile.name]

~/miniconda3/envs/pygmt/lib/python3.7/contextlib.py in __enter__(self)
    110         del self.args, self.kwds, self.func
    111         try:
--> 112             return next(self.gen)
    113         except StopIteration:
    114             raise RuntimeError("generator didn't yield") from None

~/pygmt/pygmt/clib/session.py in virtualfile_from_matrix(self, matrix)
   1268         )
   1269 
-> 1270         self.put_matrix(dataset, matrix)
   1271 
   1272         with self.open_virtual_file(

~/pygmt/pygmt/clib/session.py in put_matrix(self, dataset, matrix, pad)
    906         )
    907         if status != 0:
--> 908             raise GMTCLibError("Failed to put matrix of type {}.".format(matrix.dtype))
    909 
    910     def write_data(self, family, geometry, mode, wesn, output, data):

GMTCLibError: Failed to put matrix of type object.

System information

Please paste the output of python -c "import pygmt; pygmt.show_versions()":

PyGMT information:
  version: v0.1.2+55.g6deb388
System information:
  python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08)  [GCC 7.5.0]
  executable: ~/miniconda3/envs/pygmt/bin/python
  machine: Linux-4.19.0-8-amd64-x86_64-with-debian-10.5
Dependency information:
  numpy: 1.19.1
  pandas: 1.1.1
  xarray: 0.16.0
  netCDF4: 1.5.3
  packaging: 20.4
  ghostscript: 9.27
  gmt: None
GMT library information:
  binary dir: ~/miniconda3/envs/pygmt/bin
  cores: 2
  grid layout: rows
  library path: ~/miniconda3/envs/pygmt/lib/libgmt.so
  padding: 2
  plugin dir: ~/miniconda3/envs/pygmt/lib/gmt/plugins
  share dir: ~/miniconda3/envs/pygmt/share/gmt
  version: 6.1.1
@weiji14 weiji14 added the bug Something isn't working label Sep 10, 2020
@seisman
Copy link
Member

seisman commented Sep 15, 2020

Here is the definition of the GMT_Put_Matrix function:

int GMT_Put_Matrix (void *API, struct GMT_MATRIX *M, unsigned int type, int pad, void *matrix) 

The third parameter type is the data type of the matrix, e.g., GMT_DOUBLE, GMT_FLOAT. It also means that all elements of the matrix must have the exact same data type. Thus, in PyGMT, we can't pass 2D numpy arrays with mixed data types to put_matrix function.

The fix seems easy. We may have to pass 2D arrays as a series of vectors, via virtualfile_from_vectors.

@seisman
Copy link
Member

seisman commented Sep 19, 2020

Ping @weiji14.

@weiji14
Copy link
Member Author

weiji14 commented Sep 19, 2020

The third parameter type is the data type of the matrix, e.g., GMT_DOUBLE, GMT_FLOAT. It also means that all elements of the matrix must have the exact same data type. Thus, in PyGMT, we can't pass 2D numpy arrays with mixed data types to put_matrix function.

The fix seems easy. We may have to pass 2D arrays as a series of vectors, via virtualfile_from_vectors.

Right, so we'll need to have something like an if-then or try-except to handle mixed dtypes. A couple of other details to consider:

  1. Do we switch to using put_vectors for info all the time (will involve a for-loop), or do we check if dtypes are mixed, then use put_vectors, else use put_matrix as per usual.

Note that numpy.arrays always have the same dtype, it will just be np.object if dtypes are mixed. pandas.DataFrames are the ones that can explicitly have different dtypes in different columns.

  1. Should we generalize info to handle/support other mixed dtype combinations (e.g. int32/float32/etc) properly, thinking about Add a test to check passing arrays of mixed dtypes to GMT #547 here.

I've got a unit test for this written up already and will submit a PR soon, just need to work out these implementation details 😄.

@weiji14
Copy link
Member Author

weiji14 commented Sep 22, 2020

Just following up on this, we've merged in #619 so if you install PyGMT from the master branch, passing in datetime inputs won't result in "GMTCLibError: Failed to put matrix of type object." anymore. However, the datetime column's ranges will be reported in UNIX timestamps instead of ISO datetimes.

A workaround for this as mentioned at GenericMappingTools/gmt#4241 (comment) is to use something like pygmt.info(table=df, f="1T"), which would explicitly tell GMT that the second column is a datetime type, and should be handled that way.

We will close this issue once this upstream GMT issue at GenericMappingTools/gmt#4241 is resolved, and perhaps when PyGMT bumps the minimum required version to GMT 6.2.0 and/or when conda GMT 6.2.0.dev builds are available with conda-forge/gmt-feedstock#100.

@weiji14 weiji14 added the upstream Bug or missing feature of upstream core GMT label Sep 22, 2020
@weiji14 weiji14 mentioned this issue Sep 30, 2020
5 tasks
@weiji14 weiji14 added this to the 0.3.0 milestone Oct 27, 2020
@weiji14
Copy link
Member Author

weiji14 commented Nov 11, 2020

A workaround for this as mentioned at GenericMappingTools/gmt#4241 (comment) is to use something like pygmt.info(table=df, f="1T"), which would explicitly tell GMT that the second column is a datetime type, and should be handled that way.

So the workaround doesn't quite work because of the way we've implemented things in #619 using np.loadtxt:

import pandas as pd
import pygmt

table = pd.date_range(start="2010-01-01", end="2020-01-01")
pygmt.info(table=table, spacing="1Y", f="0T")

errors with:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-cac984c8d7d8> in <module>
----> 1 pygmt.info(table=df[[time_var, elev_var]], spacing=f"1W/{spacing}", f="0T")

~/miniconda3/envs/pygmt/src/pygmt/pygmt/helpers/decorators.py in new_module(*args, **kwargs)
    268                 if alias in kwargs:
    269                     kwargs[arg] = kwargs.pop(alias)
--> 270             return module_func(*args, **kwargs)
    271 
    272         new_module.aliases = aliases

~/miniconda3/envs/pygmt/src/pygmt/pygmt/modules.py in info(table, **kwargs)
    137             if result.startswith(("-R", "-T")):  # e.g. -R0/1/2/3 or -T0/9/1
    138                 result = result[2:].replace("/", " ")
--> 139             result = np.loadtxt(result.splitlines())
    140 
    141         return result

~/miniconda3/envs/pygmt/lib/python3.8/site-packages/numpy/lib/npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
   1137         # converting the data
   1138         X = None
-> 1139         for x in read_data(_loadtxt_chunksize):
   1140             if X is None:
   1141                 X = np.array(x, dtype)

~/miniconda3/envs/pygmt/lib/python3.8/site-packages/numpy/lib/npyio.py in read_data(chunk_size)
   1065 
   1066             # Convert each value according to its column and store
-> 1067             items = [conv(val) for (conv, val) in zip(converters, vals)]
   1068 
   1069             # Then pack it according to the dtype's nesting

~/miniconda3/envs/pygmt/lib/python3.8/site-packages/numpy/lib/npyio.py in <listcomp>(.0)
   1065 
   1066             # Convert each value according to its column and store
-> 1067             items = [conv(val) for (conv, val) in zip(converters, vals)]
   1068 
   1069             # Then pack it according to the dtype's nesting

~/miniconda3/envs/pygmt/lib/python3.8/site-packages/numpy/lib/npyio.py in floatconv(x)
    761         if '0x' in x:
    762             return float.fromhex(x)
--> 763         return float(x)
    764 
    765     typ = dtype.type

ValueError: could not convert string to float: '2019-05-19T20:53:51'

np.loadtxt assumes that the text are to be read as floating point numbers, but datetimes like "2019-05-19T20:53:51" are not floats. We'll need to set the dtype using np.loadtxt(..., dtype=???), where ??? is "str,float" or something (ref https://stackoverflow.com/a/31554777/6611055).

result = np.loadtxt(result.splitlines())

@weiji14
Copy link
Member Author

weiji14 commented Mar 3, 2021

Alright, with #960 merged. Anyone installing PyGMT from the master branch (see https://www.pygmt.org/v0.3.0/install.html#using-pip) should be able to use the coltypes="0T" GMT 6.1.1 workaround (where 0T means the first column contains time), i.e.:

import pandas as pd
import pygmt

table = pd.date_range(start="2010-01-01", end="2020-01-01")
region = pygmt.info(table=table, spacing="1Y", coltypes="0T")
print(region)
# ['2010-01-01T00:00:00' '2020-01-01T00:00:00' '0' '0']

Assuming that GenericMappingTools/gmt#4241 is resolved in GMT 6.2.0, then GMT 6.2.0 users won't need to use the coltypes parameter in the future (saves people from needing to know what is the number of the time column).

@weiji14 weiji14 unpinned this issue Mar 3, 2021
@weiji14
Copy link
Member Author

weiji14 commented Mar 14, 2021

FYI, GenericMappingTools/gmt#4241 has been magically resolved, so this issue can be resolved when PyGMT bumps the minimum version to GMT 6.2.0!

@maxrjones
Copy link
Member

Fixed by GenericMappingTools/gmt#4849

@weiji14
Copy link
Member Author

weiji14 commented Apr 22, 2021

Phew, thanks team, glad to close down another >6 month old issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants