Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array.query no longer works for string attributes #307

Closed
bmccandless opened this issue May 6, 2020 · 2 comments · Fixed by #309
Closed

array.query no longer works for string attributes #307

bmccandless opened this issue May 6, 2020 · 2 comments · Fixed by #309

Comments

@bmccandless
Copy link

I have a case that looks basically like this:

array = tiledb.DenseArray(uri, mode="r")
q = array.query(attrs=["attrname"])
data = q[:]

This works when the type of the attribute is a float or an int.
But if the type of the attribute is a string, this fails with an exception:

  File "tiledb/libtiledb.pyx", line 3874, in tiledb.libtiledb.Query.__getitem__
  File "tiledb/libtiledb.pyx", line 4107, in tiledb.libtiledb.DenseArrayImpl.subarray
  File "tiledb/libtiledb.pyx", line 4156, in tiledb.libtiledb.DenseArrayImpl._read_dense_subarray
  File "tiledb/libtiledb.pyx", line 3760, in tiledb.libtiledb.Array._unpack_varlen_query
  File "tiledb/libtiledb.pyx", line 3760, in tiledb.libtiledb.Array._unpack_varlen_query
  File "tiledb/libtiledb.pyx", line 3805, in tiledb.libtiledb.Array._unpack_varlen_query
SystemError: Negative size passed to PyUnicode_FromStringAndSize

This worked in v1.7.6, and I ran into this problem when porting to 2.0.0.
If there is a better way to pull out just one attribute from the array, then
I'm open to suggestions.

example.zip

Attached is a script, output from 1.7.6, and output from 2.0.0

@bmccandless
Copy link
Author

Followup: I misdiagnosed the problem.

It is not just with query, but also with indexing into the array.
My array was created with 1.7.6, if that matters. It has 5 attributes, 2 of which are strings. I found that slicing into the array works up to a point. I did a binary search to
see exactly where it fails. My array has 1305994 rows, but I can only slice up to 1305008.

Let me know I you'd like me to send the array.

array = tiledb.DenseArray(uri, mode="r", ctx=ctx)
array.schema.dump()
print(array.shape)
print(array[:1305008])
print(array[:1305009])

output:

- Array type: dense
- Cell order: row-major
- Tile order: row-major
- Capacity: 10000
- Allows duplicates: false
- Coordinates filters: 1
  > ZSTD: COMPRESSION_LEVEL=-1
- Offsets filters: 1
  > ZSTD: COMPRESSION_LEVEL=-1

### Dimension ###
- Name: __dim_0
- Type: UINT32
- Cell val num: 1
- Domain: [0,1305993]
- Tile extent: 1000
- Filters: 0

### Attribute ###
- Name: name_0
- Type: STRING_UTF8
- Cell val num: var
- Filters: 1
  > ZSTD: COMPRESSION_LEVEL=22

### Attribute ###
- Name: n_counts_all
- Type: FLOAT32
- Cell val num: 1
- Filters: 1
  > ZSTD: COMPRESSION_LEVEL=22

### Attribute ###
- Name: n_counts
- Type: FLOAT32
- Cell val num: 1
- Filters: 1
  > ZSTD: COMPRESSION_LEVEL=22

### Attribute ###
- Name: louvain
- Type: STRING_UTF8
- Cell val num: var
- Filters: 1
  > ZSTD: COMPRESSION_LEVEL=22

### Attribute ###
- Name: n_genes
- Type: INT32
- Cell val num: 1
- Filters: 1
  > ZSTD: COMPRESSION_LEVEL=22



(1305994,)

OrderedDict([('name_0', array(['AAACCTGAGATAGGAG-1', 'AAACCTGAGCGGCTTC-1', 'AAACCTGAGGAATCGC-1',
       ..., 'TGCGTGGAGAAACCGC-133', 'TGCGTGGAGAATAGGG-133',
       'TGCGTGGAGATGTCGG-133'], dtype=object)), ('n_counts_all', array([4046., 2087., 4654., ..., 2575., 4662., 3479.], dtype=float32)), ('n_counts', array([ 793.81116,  935.5113 ,  658.3262 , ..., 1326.0614 ,  997.1257 ,
       1113.8953 ], dtype=float32)), ('louvain', array(['5', '2', '6', ..., '12', '26', '19'], dtype=object)), ('n_genes', array([ 85,  76, 114, ...,  67,  79, 103], dtype=int32))])

Traceback (most recent call last):
  File "server/test/tiledb1.py", line 20, in <module>
    print(array[:1305009])
  File "tiledb/libtiledb.pyx", line 4002, in tiledb.libtiledb.DenseArrayImpl.__getitem__
  File "tiledb/libtiledb.pyx", line 4107, in tiledb.libtiledb.DenseArrayImpl.subarray
  File "tiledb/libtiledb.pyx", line 4156, in tiledb.libtiledb.DenseArrayImpl._read_dense_subarray
  File "tiledb/libtiledb.pyx", line 3760, in tiledb.libtiledb.Array._unpack_varlen_query
  File "tiledb/libtiledb.pyx", line 3760, in tiledb.libtiledb.Array._unpack_varlen_query
  File "tiledb/libtiledb.pyx", line 3805, in tiledb.libtiledb.Array._unpack_varlen_query
SystemError: Negative size passed to PyUnicode_FromStringAndSize

@ihnorton
Copy link
Member

ihnorton commented May 7, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants