Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix comparison error for pandas dataframe dtype #1054

Merged

Conversation

Shelnutt2
Copy link
Member

The dtype passed in is likely to be of numpy.dtype when the dtype comes directly from pandas. This means we need to check the type before comparing to the string "ascii".

@Shelnutt2 Shelnutt2 requested a review from nguyenv April 28, 2022 14:03
@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #17286: TypeError: data type 'ascii' not understood.

The dtype passed in is likely to be of numpy.dtype when the dtype comes
directly from pandas. This means we need to check the type before
comparing to the string "ascii".
@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/sc-17286/typeerror-data-type-ascii-not-understood branch from c0384ad to d53ca14 Compare April 28, 2022 14:04
Copy link
Collaborator

@nguyenv nguyenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm this fixes a test that we've seen fail when using NumPy 1.20 (but passes on NumPy 1.22 - latest release). We'll be updating CI so that it tests against multiple versions of NumPy.

(tiledb-clean) vivian@mangonada:~/TileDB-Py$ pip list | grep numpy
numpy            1.20.0
(tiledb-clean) vivian@mangonada:~/TileDB-Py$ git branch --show-current
dev
(tiledb-clean) vivian@mangonada:~/TileDB-Py$ pytest -k "test_sparse_index_dtypes[i8]"
================================================ test session starts ================================================
platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/vivian/TileDB-Py, configfile: pyproject.toml, testpaths: tiledb/tests
plugins: hypothesis-6.45.1
collected 450 items / 449 deselected / 1 skipped / 1 selected

tiledb/tests/test_libtiledb.py F                                                                              [100%]

===================================================== FAILURES ======================================================
___________________________________ TestSparseArray.test_sparse_index_dtypes[i8] ____________________________________

self = <tiledb.tests.test_libtiledb.TestSparseArray object at 0x7ff47b0f9340>, dtype = 'i8'

    @pytest.mark.skipif(not has_pandas(), reason="pandas not installed")
    @pytest.mark.parametrize("dtype", INTEGER_DTYPES)
    def test_sparse_index_dtypes(self, dtype):
        path = self.path()
        data = np.arange(0, 3).astype(dtype)

>       schema = schema_from_dict(attrs={"attr": data}, dims={"d0": data})

tiledb/tests/test_libtiledb.py:2248:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tiledb/util.py:37: in schema_from_dict
    return _sparse_schema_from_dict(attrs, dims)
tiledb/util.py:8: in _sparse_schema_from_dict
    attr_infos = {k: ColumnInfo.from_values(v) for k, v in input_attrs.items()}
tiledb/util.py:8: in <dictcomp>
    attr_infos = {k: ColumnInfo.from_values(v) for k, v in input_attrs.items()}
tiledb/dataframe_.py:109: in from_values
    return cls.from_dtype(array_like.dtype, varlen_types)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'tiledb.dataframe_.ColumnInfo'>, dtype = dtype('int64'), varlen_types = ()

    @classmethod
    def from_dtype(cls, dtype, varlen_types=()):
        from pandas.api import types as pd_types

>       if dtype == "ascii":
E       TypeError: data type 'ascii' not understood

tiledb/dataframe_.py:115: TypeError
============================================== short test summary info ==============================================
FAILED tiledb/tests/test_libtiledb.py::TestSparseArray::test_sparse_index_dtypes[i8] - TypeError: data type 'ascii...
=================================== 1 failed, 1 skipped, 449 deselected in 0.93s ====================================
(tiledb-clean) vivian@mangonada:~/TileDB-Py$ git branch --show-current
sethshelnutt/sc-17286/typeerror-data-type-ascii-not-understood
(tiledb-clean) vivian@mangonada:~/TileDB-Py$ pip list | grep numpy
numpy            1.20.0
(tiledb-clean) vivian@mangonada:~/TileDB-Py$ git branch --show-current
sethshelnutt/sc-17286/typeerror-data-type-ascii-not-understood
(tiledb-clean) vivian@mangonada:~/TileDB-Py$ pytest -k "test_sparse_index_dtypes[i8]"
================================================ test session starts ================================================
platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/vivian/TileDB-Py, configfile: pyproject.toml, testpaths: tiledb/tests
plugins: hypothesis-6.45.1
collected 449 items / 448 deselected / 1 skipped / 1 selected

tiledb/tests/test_libtiledb.py .                                                                              [100%]

=================================== 1 passed, 1 skipped, 448 deselected in 0.80s ====================================

@nguyenv
Copy link
Collaborator

nguyenv commented Apr 28, 2022

[sc-17286]
[sc-16509]

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #16509: Test failure against numpy 1.20.

@nguyenv nguyenv merged commit 4d20eaf into dev Apr 28, 2022
@nguyenv nguyenv deleted the sethshelnutt/sc-17286/typeerror-data-type-ascii-not-understood branch April 28, 2022 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants