adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory #1283

aab200 · 2023-11-10T18:32:04Z

Hello,

Was trying the adbc python snow driver and I see something odd . Simple function

# Query via adbc
def query_via_adbc_arrow(conn_adbc, query, return_type='arrow'):
    print(f"Enters query_via_adbc_arrow  ")
    start_time = datetime.datetime.now()
    cursor = conn_adbc.cursor()
    cursor.execute(query)
    # Return pandas or arrow
    if return_type == 'arrow':
        df = cursor.fetch_arrow_table()
        num_rows = df.num_rows
    else:
        df = cursor.fetch_df() 

        #df = cursor.fetch_arrow_table().to_pandas()
        num_rows = df.shape[0]
    cursor.close()
    end_time = datetime.datetime.now()
    execution_time = (end_time - start_time).total_seconds()
    print(f" Got {num_rows} rows  in {return_type} . The execution time is {execution_time} secs ")
    print(f"Exits query_via_adbc_arrow ")

Sometimes it works and some times it does not ( the regular snowflake python connector works every time )
When we run adbc part there are weird errors
(310) [aborissov@bhsys-data-dev-euw1c-lnx10 python_project2]$ python3 snow_python_adbc.py
Enters query_via_adbc_arrow
Got 10003143 rows in arrow . The execution time is 1.62057 secs
Exits query_via_adbc_arrow
Enters query_via_adbc_arrow
Traceback (most recent call last):
File "/<>/python_project2/snow_python_adbc.py", line 133, in
df = query_via_adbc_arrow(conn_adbc, "select * from private.main_td_limit_orders", return_type='pandas')
File "/bh<>/snow_python_adbc.py", line 91, in query_via_adbc_arrow
df = cursor.fetch_df() # Does not work well with large data running out of memory
File "/<>/venvs/310/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 1050, in fetch_df
return self._results.fetch_df()
File "<>/venvs/310/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 1139, in fetch_df
return self._reader.read_pandas()
File "adbc_driver_manager/_reader.pyx", line 108, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_pandas
File "adbc_driver_manager/_reader.pyx", line 40, in adbc_driver_manager._reader._AdbcErrorHelper.check_error
adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory
(310) [<>python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<>python_project2]$ python3 snow_python_adbc.py
Enters query_via_adbc_arrow
Got 10003143 rows in arrow . The execution time is 1.535479 secs
Exits query_via_adbc_arrow
Enters query_via_adbc_arrow
Traceback (most recent call last):
File "/<>/python_project2/python_project2/snow_python_adbc.py", line 133, in
df = query_via_adbc_arrow(conn_adbc, "select * from private.main_td_limit_orders", return_type='pandas')
File "/<>/python_project2/python_project2/snow_python_adbc.py", line 91, in query_via_adbc_arrow
df = cursor.fetch_df() # Does not work well with large data running out of memory
File "/<>/venvs/310/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 1050, in fetch_df
return self._results.fetch_df()
File "<>/venvs/310/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 1139, in fetch_df
return self._reader.read_pandas()
File "adbc_driver_manager/_reader.pyx", line 108, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_pandas
File "adbc_driver_manager/_reader.pyx", line 40, in adbc_driver_manager._reader._AdbcErrorHelper.check_error
adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$ python3 snow_python_adbc.py
Enters query_via_adbc_arrow
Got 10003143 rows in arrow . The execution time is 1.560988 secs
Exits query_via_adbc_arrow
Enters query_via_adbc_arrow
Got 10003143 rows in pandas . The execution time is 2.706668 secs
Exits query_via_adbc_arrow
(310) [<> python_project2]$

Has anyone experienced that ? Thanks

The text was updated successfully, but these errors were encountered:

aab200 · 2023-11-10T18:37:51Z

Just to clarify I am using
adbc-driver-snowflake 0.7.0
snowflake-connector-python 3.4.0
snowflake-snowpark-python 1.9.0
arrow 1.3.0
pyarrow 10.0.1

bascheibler · 2023-11-10T19:28:57Z

I've just experienced the same issue (exception UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory).

I'm running the code snippet below in a Docker container based on the python:3.12 image. The packages versions are:

pyarrow==14.0.1
adbc_driver_snowflake==0.8.0

Here's the code snippet (not the full function's code):

def export_table(schema_name, table_name):
    query = f"select * from {schema_name}.{table_name}"

    with adbc_driver_snowflake.dbapi.connect(
        uri = snowflake_uri,
        db_kwargs = {"adbc.snowflake.sql.client_option.use_high_precision": "false"}
    ) as conn:
        with conn.cursor() as cursor:
            print(f"Querying {schema_name}.{table_name}")
            cursor.execute(query)  
            try:
                Table = cursor.fetch_arrow_table()
            except Exception as e:
                print(f"{schema_name}.{table_name} got exception {e}")

The function above fails after fetching a few tables. I've tried wrapping it in a for loop and also in a multiprocess Pool. Both don't succeed.

lidavidm · 2023-11-10T20:01:18Z

About how large is the dataset you're trying to fetch?

The ADBC driver tries to buffer the parts of the dataset concurrently. You could try setting the options to limit the queue size and concurrency to cut down on memory usage. (We could/should also probably limit the overall buffer size based on memory usage, I suspect.) https://arrow.apache.org/adbc/current/driver/snowflake.html#performance

lidavidm · 2023-11-10T20:02:22Z

"fails after fetching a few tables"

The other thing I would suspect here is if we somehow are keeping a reference to the dataset when we should not be...

CC @zeroshade for any ideas too

aab200 · 2023-11-10T20:06:14Z

10 million rows , 166 MB( In snowflake ) , 19 columns

bascheibler · 2023-11-10T22:11:07Z

"fails after fetching a few tables"

The other thing I would suspect here is if we somehow are keeping a reference to the dataset when we should not be...

CC @zeroshade for any ideas too

Actually, it has just failed on the 1st or 2nd execution of the pool. And I can see that this time it was probably cursor.execute() that failed.

The order of tables that are read differ slightly on each run, so it might be really associated to the table size. I'll try setting the queue size and concurrency and will keep you posted. Thanks.

edit: here's the Go output, in addition to the Exception above:

panic: close of nil channel

goroutine 1720 [running]:
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader.func4()
        /adbc/go/adbc/driver/snowflake/record_reader.go:572 +0x85
created by github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader
        /adbc/go/adbc/driver/snowflake/record_reader.go:568 +0xbc8

lidavidm · 2023-11-11T00:01:59Z

Ok. The Go traceback indicates an actual bug that needs fixing.

The dataset described doesn't sound big enough to be worthy of causing memory issues, either. (Unless it's very highly compressed while in Snowflake.) So something seems off.

lidavidm · 2023-11-13T14:24:16Z

goroutine 1720 [running]:
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader.func4()
        /adbc/go/adbc/driver/snowflake/record_reader.go:572 +0x85
created by github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader
        /adbc/go/adbc/driver/snowflake/record_reader.go:568 +0xbc8

@zeroshade from looking at the code, it seems a post-0.8.0 refactor should already have fixed this condition, do you agree?

zeroshade · 2023-11-13T16:33:46Z

Sorry for the delay in me taking a look at this...

@zeroshade from looking at the code, it seems a post-0.8.0 refactor should already have fixed this condition, do you agree?

Looking at the code I think I'd agree with you. @aab200, @bascheibler are either of you able to reproduce this error/failure if you build from master?

bascheibler · 2023-11-13T19:01:58Z

I've built from master (it installed adbc_driver_snowflake version '0.0.0+gff222e20' following the instructions here) and I'm still getting some errors, as the examples listed below:

panic: close of nil channel

goroutine 232 [running]:
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader.func4()
        /adbc/go/adbc/driver/snowflake/record_reader.go:572 +0x85
created by github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader
        /adbc/go/adbc/driver/snowflake/record_reader.go:568 +0xbc8

or (with more info printed from traceback.format_exc())

Traceback (most recent call last):
  File "/app/snowflake.py", line 41, in export_table
    Table = cursor.fetch_arrow_table()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/dbapi.py", line 1032, in fetch_arrow_table
    return self._results.fetch_arrow_table()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/dbapi.py", line 1135, in fetch_arrow_table
    return self._reader.read_all()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "adbc_driver_manager/_reader.pyx", line 90, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_all
  File "adbc_driver_manager/_reader.pyx", line 40, in adbc_driver_manager._reader._AdbcErrorHelper.check_error
adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory

The tables that failed have 2.2M, 98K and 13.3K rows with 48.0MB, 41.9MB and 229.5KB, respectively.

Python packages:

-e git+https://github.com/apache/arrow-adbc.git@ff222e2004e281d26395964d982320bf5c506160#egg=adbc_driver_manager&subdirectory=python/adbc_driver_manager
-e git+https://github.com/apache/arrow-adbc.git@ff222e2004e281d26395964d982320bf5c506160#egg=adbc_driver_snowflake&subdirectory=python/adbc_driver_snowflake
numpy==1.26.1
pyarrow==14.0.1
python-dotenv==1.0.0
setuptools==68.2.2
wheel==0.41.3

Please let me know if there's anything else I could do to help debugging this. I'd be happy to contribute.

zeroshade · 2023-11-13T21:16:35Z

@bascheibler could you provide any kind of equivalent dataset via parquet/csv/otherwise that I could easily load into my own snowflake instance to attempt to reproduce this?

bascheibler · 2023-11-13T21:28:14Z

The ADBC driver tries to buffer the parts of the dataset concurrently. You could try setting the options to limit the queue size and concurrency to cut down on memory usage. (We could/should also probably limit the overall buffer size based on memory usage, I suspect.) https://arrow.apache.org/adbc/current/driver/snowflake.html#performance

These AdbcStatement options really do seem to indicate it's a performance issue. I've set prefetch_concurrency to 1 and it significantly increased the number of tables that fail (~80%). Before that, only approx. 25% were failing. If I set this param to 75 and result_queue_size to 500, on the other side, the error rate is reduced to between 5% - 10% (I'll keep playing with these parameters to check if I can get to 0%).

The refactored code is detailed below. I had to use a lower-level code to be able to set the statement options.

def export_table_low_level(schema_name, table_name):
    logging.debug(f"Starting download of {schema_name}.{table_name}")
    query = f"select * from {schema_name}.{table_name}"

    with adbc_driver_snowflake.connect(
        uri = snowflake_uri,
        db_kwargs = {
            "adbc.snowflake.sql.client_option.use_high_precision": "false"
        }
    ) as db:
        with adbc_driver_manager.AdbcConnection(db) as conn:
            with adbc_driver_manager.AdbcStatement(conn) as stmt:
                stmt.set_options(
                    **{
                        adbc_driver_snowflake.StatementOptions.PREFETCH_CONCURRENCY.value: "1"
                    }
                )
                stmt.set_sql_query(query)
                stream, _ = stmt.execute_query()
                reader = pyarrow.RecordBatchReader._import_from_c(stream.address)
                Table = reader.read_all()

lidavidm · 2023-11-13T21:57:42Z

Hmm, so increasing the concurrency/aggressiveness is what reduces it...? That doesn't sound like an out-of-memory issue then

zeroshade · 2023-11-13T22:01:30Z

maybe timeouts or something on the snowflake side? yea, this would be easiest if we could have a dataset that can reproduce this. I'll try tomorrow to see if i can generate one unless @bascheibler is able to provide something roughly equivalent to what he's using (I assume the actual data you're using is likely not able to be released publicly) that is able to trigger the errors.

bascheibler · 2023-11-14T05:59:01Z

Here's a repo with both the dataset and the scripts to reproduce the issue: https://github.com/bascheibler/arrow-adbc-issue-1283

I've created only 3 test samples with hashed data off of some of the tables that failed. Unfortunately, I wasn't able to test this new dataset because the adbc_ingest function is either not committing or its payload is too large (please see my comment in the main.py file). The tables that get created have 0 rows in Snowflake. However, please feel free to use these parquet files as work best for you.

Please let me know if this data is enough to reproduce the issues. We may need to force a longer loop with repeated tables in order to get an error. Not sure how it behaves with a shorter list of tables.

shollington-rbi · 2023-11-14T15:23:35Z

We are also seeing this issue sporadically too when executing a fetch_arrow_table() on a AWS Lambda Python script when connecting to Snowflake. @bascheibler - were you able to find a combination of parameter configuration that minimizes the frqeuncy?

zeroshade · 2023-11-14T20:01:11Z

By setting the prefetch concurrency option to "1" i was able to reproduce the close of nil channel error with a pure go mainprog. I'm going to dig in to try to determine a solution

zeroshade · 2023-11-14T20:24:00Z

sigh As with any complex system, it came down to a race condition. @shollington-rbi @bascheibler @aab200 could any of you try pulling down the branch from the linked PR and verifying it fixes the problem for you?

bascheibler · 2023-11-14T22:25:07Z

I believe this branch hasn't totally fixed the issue yet. It is still returning the panic: close of nil channel error.
My repo has been fixed and now it's able to reproduce the issue end-to-end. To test it from the race-conditions-suck branch, just make sure that the REPRODUCE_ISSUE build arg is set to false.

zeroshade · 2023-11-14T22:34:45Z

I'll try to use the updated repo scripts and see if i can reproduce it myself and figure out what I'm missing

zeroshade · 2023-11-15T18:32:28Z

I don't think your dockerfile is picking up the updated version of the lib from my branch when it runs. I was able to reproduce the close of nil channel error, but the stack trace matches the v0.8.0 version of the lib, not the updated version from my branch with the fix. I'm gonna see if i can figure out how to update the Dockerfile so that it properly picks it up.

bascheibler · 2023-11-15T20:32:19Z

OK, thanks for checking that. I've added that pip freeze step in Dockerfile just to make sure it wasn't installing the 0.8.0 version, but I guess I'm missing something.

FYI: in the meantime, I've switched to snowflake-connector-python (version 3.5.0) and it's working fine so far. It's based on the nanoarrow project, right? Just checking.

lidavidm · 2023-11-15T20:33:34Z

It is based on nanoarrow, but is unrelated to anything in ADBC.

zeroshade · 2023-11-15T20:36:09Z

So after getting it to run with the updated build, I now got the failure to allocate memory error that was originally filed. I haven't been able to reproduce that with pure Go yet. So I'm not calling this solved just yet until I can determine the cause of that allocation error.

zeroshade · 2023-11-18T00:23:27Z

So I managed to reproduce the "cannot allocate memory" error locally and interactive. With a bit of manipulation I was able to get stack traces and look closer at the situation. The "cannot allocate memory" error is being directly returned by the call to calloc in Mallocator.Allocate. The request for the allocation that it fails on seems to be somewhere between 800KB and a bit over 1MB, so it's not failing due to an invalid (0 or negative) size for allocation. The failure is also happening while i still have more than 10 - 12 GB of RAM available to me.

I'm only able to reproduce this issue via the python script in the repo provided by @bascheibler, I haven't been able to create a reproducer with just Go.

zeroshade · 2023-11-18T00:25:49Z

Either way, I don't think this is related to the race condition, so I think the race condition should be fixed on its own and then we can look into the "cannot allocate memory" separately. I'll keep digging into this but it just seems REALLY strange to me

lidavidm · 2023-11-20T14:31:21Z

What were both of the parameters for calloc?

zeroshade · 2023-11-20T15:44:03Z

first argument somewhere between 800000 and 1200000, second argument is 1

…1297) Fixes #1283

lidavidm · 2023-11-27T16:12:09Z

@zeroshade did that PR resolve both issues?

zeroshade · 2023-11-27T17:32:01Z

Gah, crap. No. It didn't resolve the unable to allocate issue. I forgot to update it before merging. My mistake

zeroshade · 2024-03-29T20:24:22Z

apache/arrow#40902 provides an upstream fix for this issue

Fixes #1283 by incorporating the upstream fix

ir3456 · 2024-04-23T22:21:42Z

I am still seeing this issue when running on machines running Ubuntu. No problem running on Mac.

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py:1089, in Cursor.fetch_arrow_table(self)
   1084 if self._results is None:
   1085     raise ProgrammingError(
   1086         "Cannot fetch_arrow_table() before execute()",
   1087         status_code=_lib.AdbcStatusCode.INVALID_STATE,
   1088     )
-> 1089 return self._results.fetch_arrow_table()

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py:1198, in _RowIterator.fetch_arrow_table(self)
   1197 def fetch_arrow_table(self) -> pyarrow.Table:
-> 1198     return _blocking_call(self._reader.read_all, (), {}, self._stmt.cancel)

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/_lib.pyx:1552, in adbc_driver_manager._lib._blocking_call_impl()

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/_lib.pyx:1545, in adbc_driver_manager._lib._blocking_call_impl()

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/_reader.pyx:90, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_all()

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/_reader.pyx:40, in adbc_driver_manager._reader._AdbcErrorHelper.check_error()

OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory

In [3]: exit
(venv) dev@updated-backtest-5:~/isaac$ pip freeze | grep arrow
pyarrow==16.0.0
(venv) dev@updated-backtest-5:~/isaac$ pip freeze | grep adbc
adbc-driver-manager==0.11.0
adbc-driver-snowflake==0.11.0

I have the latest versions of pyarrow, adbc-driver-manager, and adbc-driver-snowflake. Is there something else I should be updating?

lidavidm · 2024-04-24T00:14:58Z

Oops, I think 0.11.0 didn't actually incorporate the fix (the release happened right before the fix got in).

There are nightly wheels here, do you want to try them? (Download the artifact "python-amd64-manylinux2014", extract it, and manually pip install path/to/wheel.whl) https://github.com/apache/arrow-adbc/actions/runs/8792621756

ir3456 · 2024-04-24T04:29:41Z

Thanks for the help! That is working as expected and fixes the bug we were seeing. Any idea when we can expect this to be released?

lidavidm · 2024-04-24T04:53:43Z

I plan to kick off the release in the next week or two, actually. That will be called "1.0" because I believe the Snowflake driver (and a few others) are generally ready. (Still deciding whether I will spend the time to try to decouple our version numbers.)

So TL;DR by mid-May, if all goes well.

ir3456 · 2024-04-24T15:44:58Z

Is there any chance that there will be a patch release 0.11.1 that we will be able to use before the 1.0?

lidavidm · 2024-04-24T22:25:06Z

For us all releases take an equal amount of effort so it would functionally be the same as waiting for the next release.

lidavidm added this to the ADBC Libraries 0.9.0 milestone Nov 10, 2023

zeroshade mentioned this issue Nov 14, 2023

fix(go/adbc/driver/snowflake): Fix race condition in record reader #1297

Merged

zeroshade closed this as completed in #1297 Nov 27, 2023

zeroshade added a commit that referenced this issue Nov 27, 2023

fix(go/adbc/driver/snowflake): Fix race condition in record reader (#…

9e4cf54

…1297) Fixes #1283

lidavidm reopened this Nov 27, 2023

lidavidm removed this from the ADBC Libraries 0.9.0 milestone Dec 19, 2023

zeroshade mentioned this issue Mar 29, 2024

[Go] Mallocator weirdness apache/arrow#40900

Closed

lidavidm added this to the ADBC Libraries 1.0.0 milestone Mar 29, 2024

zeroshade mentioned this issue Apr 1, 2024

chore(go/adbc): update upstream deps #1696

Merged

lidavidm closed this as completed in #1696 Apr 1, 2024

lidavidm pushed a commit that referenced this issue Apr 1, 2024

chore(go/adbc): update upstream deps (#1696)

49ea4ad

Fixes #1283 by incorporating the upstream fix

zeroshade mentioned this issue Jul 10, 2024

go/adbc/driver/flightsql: Default Value (10 MB) For adbc.snowflake.rpc.ingest_target_file_size Not Used In 1.1.0 #1997

Closed

adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory #1283

adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory #1283

Comments

aab200 commented Nov 10, 2023

aab200 commented Nov 10, 2023

bascheibler commented Nov 10, 2023

lidavidm commented Nov 10, 2023

lidavidm commented Nov 10, 2023

aab200 commented Nov 10, 2023

bascheibler commented Nov 10, 2023 • edited Loading

lidavidm commented Nov 11, 2023

lidavidm commented Nov 13, 2023

zeroshade commented Nov 13, 2023 • edited Loading

bascheibler commented Nov 13, 2023

zeroshade commented Nov 13, 2023

bascheibler commented Nov 13, 2023

lidavidm commented Nov 13, 2023

zeroshade commented Nov 13, 2023

bascheibler commented Nov 14, 2023 • edited Loading

shollington-rbi commented Nov 14, 2023

zeroshade commented Nov 14, 2023

zeroshade commented Nov 14, 2023

bascheibler commented Nov 14, 2023

zeroshade commented Nov 14, 2023

zeroshade commented Nov 15, 2023

bascheibler commented Nov 15, 2023

lidavidm commented Nov 15, 2023

zeroshade commented Nov 15, 2023

zeroshade commented Nov 18, 2023

zeroshade commented Nov 18, 2023

lidavidm commented Nov 20, 2023

zeroshade commented Nov 20, 2023

lidavidm commented Nov 27, 2023

zeroshade commented Nov 27, 2023

zeroshade commented Mar 29, 2024

ir3456 commented Apr 23, 2024

lidavidm commented Apr 24, 2024

ir3456 commented Apr 24, 2024 • edited Loading

lidavidm commented Apr 24, 2024

ir3456 commented Apr 24, 2024

lidavidm commented Apr 24, 2024

bascheibler commented Nov 10, 2023 •

edited

Loading

zeroshade commented Nov 13, 2023 •

edited

Loading

bascheibler commented Nov 14, 2023 •

edited

Loading

ir3456 commented Apr 24, 2024 •

edited

Loading