Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uncaught exception when operating on s3 storage #409

Closed
ilveroluca opened this issue Nov 6, 2020 · 10 comments
Closed

uncaught exception when operating on s3 storage #409

ilveroluca opened this issue Nov 6, 2020 · 10 comments
Assignees

Comments

@ilveroluca
Copy link

Hello,

I’m trying to port a Python program from an HDFS back end to S3. Running some simple tests I’m getting an uncaught exception from TileDB. As the s3 back ends, I'm generally using minio for development, through I've reproduced the same issue using a Ceph object store. Here’s the condensed example:

tdmqc@3507db1a1ae5:/tdmq-dist$ python3
Python 3.6.9 (default, Oct  8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tiledb
>>> tiledb.__version__
'0.7.0
>>> service_info = {
...         'version' : '0.1',
...         'tiledb' : {
...             'storage.root' : 's3://firstbucket/',
...             'config': {
...                 "vfs.s3.aws_access_key_id": "tdm-user",
...                 "vfs.s3.aws_secret_access_key": "tdm-user-s3",
...                 "vfs.s3.endpoint_override": "minio:9000",
...                 "vfs.s3.scheme": "http",
...                 "vfs.s3.region": "",
...                 "vfs.s3.verify_ssl": "false",
...                 "vfs.s3.use_virtual_addressing": "false",
...                 "vfs.s3.use_multipart_upload": "false",
...                 "vfs.s3.logging_level": 'TRACE'
...                 }
...             }
...         }
>>> def clean_s3(tdmq_s3_service_info):
...     import tiledb
...     config = tiledb.Config(params=tdmq_s3_service_info['tiledb']['config'])
...     bucket = tdmq_s3_service_info['tiledb']['storage.root']
...     assert bucket.startswith('s3://')
...     ctx = tiledb.Ctx(config=config)
...     vfs = tiledb.VFS(ctx=ctx)
...     if vfs.is_bucket(bucket):
...         vfs.empty_bucket(bucket)
...     else:
...         vfs.create_bucket(bucket)
...     return tdmq_s3_service_info

The first one or two times I call clean_s3(service_info) it works fine.

>>> clean_s3(service_info)
log4j:WARN File option not set for appender [FSLOGGER].
log4j:WARN Are you using FileAppender instead of ConsoleAppender?
{'version': '0.1', 'tiledb': {'storage.root': 's3://firstbucket/', 'config': {'vfs.s3.aws_access_key_id': 'tdm-user', 'vfs.s3.aws_secret_access_key': 'tdm-user-s3', 'vfs.s3.endpoint_override': 'minio:9000', 'vfs.s3.scheme': 'http', 'vfs.s3.region': '', 'vfs.s3.verify_ssl': 'false', 'vfs.s3.use_virtual_addressing': 'false', 'vfs.s3.use_multipart_upload': 'false', 'vfs.s3.logging_level': 'TRACE'}}}
>>> clean_s3(tdmq_s3_service_info)
{'version': '0.1', 'tiledb': {'storage.root': 's3://firstbucket/', 'config': {'vfs.s3.aws_access_key_id': 'tdm-user', 'vfs.s3.aws_secret_access_key': 'tdm-user-s3', 'vfs.s3.endpoint_override': 'minio:9000', 'vfs.s3.scheme': 'http', 'vfs.s3.region': '', 'vfs.s3.verify_ssl': 'false', 'vfs.s3.use_virtual_addressing': 'false', 'vfs.s3.use_multipart_upload': 'false', 'vfs.s3.logging_level': 'TRACE'}}}

Then something breaks. Further calls to the function result in an uncaught exception in TileDB:

>>> clean_s3(s3_service_info)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 8, in clean_s3
  File "tiledb/libtiledb.pyx", line 5511, in tiledb.libtiledb.VFS.is_bucket
  File "tiledb/libtiledb.pyx", line 481, in tiledb.libtiledb._raise_ctx_err
  File "tiledb/libtiledb.pyx", line 466, in tiledb.libtiledb._raise_tiledb_error
tiledb.libtiledb.TileDBError: Error: Internal TileDB uncaught exception; basic_string::compare: __pos (which is 18446744073709551615) > this->size() (which is 4)
>>> clean_s3(service_info)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 8, in clean_s3
  File "tiledb/libtiledb.pyx", line 5511, in tiledb.libtiledb.VFS.is_bucket
  File "tiledb/libtiledb.pyx", line 481, in tiledb.libtiledb._raise_ctx_err
  File "tiledb/libtiledb.pyx", line 466, in tiledb.libtiledb._raise_tiledb_error
tiledb.libtiledb.TileDBError: Error: Internal TileDB uncaught exception; basic_string::compare: __pos (which is 18446744073709551615) > this->size() (which is 4)

Once this happens I can't seem to do any s3 operation. Here are some other information I have collected:

  • Reinitializing the tiledb context has no effect.
  • Trying to reload the module has no effect (i.e., importlib.reload(tiledb)).
  • I can reproduce this with both minio and Ceph.
  • Generally, things break after the first couple of times I've called the function, generally in quick succession. Sometimes it happens on the first call.
  • I've tried distancing subsequent calls to clean_s3 by as much as 15 seconds, but the thing still breaks.
  • I'm running in Docker, in a custom ubuntu-based image. TileDB-Py is installed via pip.
  • I have condensed this example from a more complicated scenario and another example I put together (which I posted in the forum. In those cases, the exception was being generated by tiledb.object_type and by tiledb.DenseArray.create.

Given the value of _pos, I guess it's got something to do with an unsigned type being used where a signed one is expected -- or maybe not. Let me know I can be of help.

@joe-maley joe-maley self-assigned this Nov 6, 2020
@joe-maley
Copy link

It looks like it's failing on the is_bucket line. To rule out the state in the object store, would you please try reproducing by only calling is_bucket, e.g. removing calls to empty_bucket and create_bucket?

For example:

service_info = {
        'version' : '0.1',
        'tiledb' : {
            'storage.root' : 's3://firstbucket/',
            'config': {
                "vfs.s3.aws_access_key_id": "tdm-user",
                "vfs.s3.aws_secret_access_key": "tdm-user-s3",
                "vfs.s3.endpoint_override": "minio:9000",
                "vfs.s3.scheme": "http",
                "vfs.s3.region": "",
                "vfs.s3.verify_ssl": "false",
                "vfs.s3.use_virtual_addressing": "false",
                "vfs.s3.use_multipart_upload": "false",
                "vfs.s3.logging_level": 'TRACE'
                }
            }
        }

def clean_s3(tdmq_s3_service_info):
  import tiledb
  config = tiledb.Config(params=tdmq_s3_service_info['tiledb']['config'])
  bucket = tdmq_s3_service_info['tiledb']['storage.root']
  assert bucket.startswith('s3://')
  print(bucket)
  ctx = tiledb.Ctx(config=config)
  vfs = tiledb.VFS(ctx=ctx)

  is_bucket = vfs.is_bucket(bucket)
  print(is_bucket)

clean_s3(service_info)

@ihnorton
Copy link
Member

ihnorton commented Nov 6, 2020

@ilveroluca I guess you are using this dockerfile. Can you confirm that the hadoop-base used there is this one, building from ubuntu:latest: https://hub.docker.com/r/crs4/hadoop-base/dockerfile?

@ilveroluca
Copy link
Author

@ihnorton I confirm, that is my dockerfile. This is my base image: crs4/hadoopclient:3.2.0

@ilveroluca
Copy link
Author

@joe-maley

>>> def clean_s3(tdmq_s3_service_info):
...   import tiledb
...   config = tiledb.Config(params=tdmq_s3_service_info['tiledb']['config'])
...   bucket = tdmq_s3_service_info['tiledb']['storage.root']
...   assert bucket.startswith('s3://')
...   print(bucket)
...   ctx = tiledb.Ctx(config=config)
...   vfs = tiledb.VFS(ctx=ctx)
...   is_bucket = vfs.is_bucket(bucket)
...   print(is_bucket)

>>> clean_s3(service_info)
s3://firstbucket/
True
>>> 
>>> clean_s3(service_info)
s3://firstbucket/
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in clean_s3
  File "tiledb/libtiledb.pyx", line 5511, in tiledb.libtiledb.VFS.is_bucket
  File "tiledb/libtiledb.pyx", line 481, in tiledb.libtiledb._raise_ctx_err
  File "tiledb/libtiledb.pyx", line 466, in tiledb.libtiledb._raise_tiledb_error
tiledb.libtiledb.TileDBError: Error: Internal TileDB uncaught exception; basic_string::compare: __pos (which is 18446744073709551615) > this->size() (which is 4)
>>> clean_s3(service_info)
s3://firstbucket/
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in clean_s3
  File "tiledb/libtiledb.pyx", line 5511, in tiledb.libtiledb.VFS.is_bucket
  File "tiledb/libtiledb.pyx", line 481, in tiledb.libtiledb._raise_ctx_err
  File "tiledb/libtiledb.pyx", line 466, in tiledb.libtiledb._raise_tiledb_error
tiledb.libtiledb.TileDBError: Error: Internal TileDB uncaught exception; basic_string::compare: __pos (which is 18446744073709551615) > this->size() (which is 4)
>>>

@joe-maley
Copy link

joe-maley commented Nov 6, 2020

@ilveroluca Thank you!

There appears to be a bug that could manifest with a similar signature when the configuration parameter "vfs.s3.ca_file" is unset. Could you try setting that configuration parameter? On ubuntu, try setting this to "/etc/ssl/certs/ca-certificates.crt".

EDIT: Disregard, there is not a bug related to an empty "vfs.s3.ca_file".

@ihnorton
Copy link
Member

ihnorton commented Nov 6, 2020

Hi @ilveroluca,
So far we have not been able to reproduce the bug ourselves. Could you run a test to help us debug? We have built a set of wheels with some extra debugging printouts, which can be downloaded from here. Then you can install the wheel for your platform with (for example): pip install tiledb-0.7.0d-cp37-cp37m-macosx_10_13_x86_64.whl

These wheels are built from this CI pipeline, which in turn is built from this branch of TileDB-Py.

In case you want to build yourself, the branch of libtiledb that is used for these debug wheels is here: https://github.com/TileDB-Inc/TileDB/tree/jpm/tiledb-py-409-debu

you can target it locally by changing setup.py like this:

diff --git a/setup.py b/setup.py
index fa672a6..5cc7a1d 100644
--- a/setup.py
+++ b/setup.py
@@ -41,7 +41,7 @@ from sys import version_info as ver
 print("setup.py sys.argv is: ", sys.argv)

 # Target branch
-TILEDB_VERSION = "2.1.2"
+TILEDB_VERSION = "3bc2e0c76f8b95c13c7541c7970130f91877e2b1"
 # allow overriding w/ environment variable
 TILEDB_VERSION = os.environ.get("TILEDB_VERSION") or TILEDB_VERSION

@ihnorton
Copy link
Member

ihnorton commented Nov 6, 2020

Once you have installed the test wheels linked above, please run your script again, and send us the output -- it should print a number of [DEBUG] lines as the program runs, to help us trace the issue. (if you prefer to send the log privately, please email to joe and isaiah @ tiledb.com)

@ilveroluca
Copy link
Author

Hi guys,

I installed drop/tiledb-0.7.0d-cp36-cp36m-manylinux1_x86_64.whl on the image and ran the clean_s3 functions a few times:

python3 -c 'from script import service_info, clean_s3; [ clean_s3(service_info) for i in range(4) ]' > test.log 2>&1

I have attached the output here: test.log

@ihnorton
Copy link
Member

ihnorton commented Nov 9, 2020

Hi @ilveroluca, thanks, we think we have identified the problem, and will update when the fix is available (requires to upgrade the AWS library we use for s3). In the meantime, we think you may be able to work around by setting "vfs.s3.region" to a specific value. I believe the default is us-east-1 on the minio side, so I would suggest to try that first. Otherwise, if you are using the MINIO_REGION or --region options to customize, then make sure to match that in the TileDB config. Please let us know if that works.

@ilveroluca
Copy link
Author

Indeed, setting the region seems to work around the problem. Great job debugging the problem guys.

joe-maley pushed a commit to TileDB-Inc/TileDB that referenced this issue Nov 10, 2020
This bumps the AWSSDK to the latest dot-release of 1.8 to fix:
TileDB-Inc/TileDB-Py#409
joe-maley pushed a commit to TileDB-Inc/TileDB that referenced this issue Nov 11, 2020
This bumps the AWSSDK to the latest dot-release of 1.8 to fix:
TileDB-Inc/TileDB-Py#409
joe-maley pushed a commit to TileDB-Inc/TileDB that referenced this issue Nov 11, 2020
This bumps the AWSSDK to the latest dot-release of 1.8 to fix:
TileDB-Inc/TileDB-Py#409

Co-authored-by: Joe Maley <[email protected]>
joe-maley pushed a commit to TileDB-Inc/TileDB that referenced this issue Nov 11, 2020
This bumps the AWSSDK to the latest dot-release of 1.8 to fix:
TileDB-Inc/TileDB-Py#409

Co-authored-by: Joe Maley <[email protected]>

Co-authored-by: Joe Maley <[email protected]>
@ihnorton ihnorton mentioned this issue Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants