Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory #1283

Closed
aab200 opened this issue Nov 10, 2023 · 37 comments · Fixed by #1297 or #1696

Comments

@aab200
Copy link

aab200 commented Nov 10, 2023

Hello,

Was trying the adbc python snow driver and I see something odd . Simple function

# Query via adbc
def query_via_adbc_arrow(conn_adbc, query, return_type='arrow'):
    print(f"Enters query_via_adbc_arrow  ")
    start_time = datetime.datetime.now()
    cursor = conn_adbc.cursor()
    cursor.execute(query)
    # Return pandas or arrow
    if return_type == 'arrow':
        df = cursor.fetch_arrow_table()
        num_rows = df.num_rows
    else:
        df = cursor.fetch_df() 

        #df = cursor.fetch_arrow_table().to_pandas()
        num_rows = df.shape[0]
    cursor.close()
    end_time = datetime.datetime.now()
    execution_time = (end_time - start_time).total_seconds()
    print(f" Got {num_rows} rows  in {return_type} . The execution time is {execution_time} secs ")
    print(f"Exits query_via_adbc_arrow ")

Sometimes it works and some times it does not ( the regular snowflake python connector works every time )
When we run adbc part there are weird errors
(310) [aborissov@bhsys-data-dev-euw1c-lnx10 python_project2]$ python3 snow_python_adbc.py
Enters query_via_adbc_arrow
Got 10003143 rows in arrow . The execution time is 1.62057 secs
Exits query_via_adbc_arrow
Enters query_via_adbc_arrow
Traceback (most recent call last):
File "/<>/python_project2/snow_python_adbc.py", line 133, in
df = query_via_adbc_arrow(conn_adbc, "select * from private.main_td_limit_orders", return_type='pandas')
File "/bh<>/snow_python_adbc.py", line 91, in query_via_adbc_arrow
df = cursor.fetch_df() # Does not work well with large data running out of memory
File "/<>/venvs/310/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 1050, in fetch_df
return self._results.fetch_df()
File "<>/venvs/310/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 1139, in fetch_df
return self._reader.read_pandas()
File "adbc_driver_manager/_reader.pyx", line 108, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_pandas
File "adbc_driver_manager/_reader.pyx", line 40, in adbc_driver_manager._reader._AdbcErrorHelper.check_error
adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory
(310) [<>python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<>python_project2]$ python3 snow_python_adbc.py
Enters query_via_adbc_arrow
Got 10003143 rows in arrow . The execution time is 1.535479 secs
Exits query_via_adbc_arrow
Enters query_via_adbc_arrow
Traceback (most recent call last):
File "/<>/python_project2/python_project2/snow_python_adbc.py", line 133, in
df = query_via_adbc_arrow(conn_adbc, "select * from private.main_td_limit_orders", return_type='pandas')
File "/<>/python_project2/python_project2/snow_python_adbc.py", line 91, in query_via_adbc_arrow
df = cursor.fetch_df() # Does not work well with large data running out of memory
File "/<>/venvs/310/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 1050, in fetch_df
return self._results.fetch_df()
File "<>/venvs/310/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py", line 1139, in fetch_df
return self._reader.read_pandas()
File "adbc_driver_manager/_reader.pyx", line 108, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_pandas
File "adbc_driver_manager/_reader.pyx", line 40, in adbc_driver_manager._reader._AdbcErrorHelper.check_error
adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$
(310) [<> python_project2]$ python3 snow_python_adbc.py
Enters query_via_adbc_arrow
Got 10003143 rows in arrow . The execution time is 1.560988 secs
Exits query_via_adbc_arrow
Enters query_via_adbc_arrow
Got 10003143 rows in pandas . The execution time is 2.706668 secs
Exits query_via_adbc_arrow
(310) [<> python_project2]$

Has anyone experienced that ? Thanks

@aab200
Copy link
Author

aab200 commented Nov 10, 2023

Just to clarify I am using
adbc-driver-snowflake 0.7.0
snowflake-connector-python 3.4.0
snowflake-snowpark-python 1.9.0
arrow 1.3.0
pyarrow 10.0.1

@bascheibler
Copy link

I've just experienced the same issue (exception UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory).

I'm running the code snippet below in a Docker container based on the python:3.12 image. The packages versions are:

  • pyarrow==14.0.1
  • adbc_driver_snowflake==0.8.0

Here's the code snippet (not the full function's code):

def export_table(schema_name, table_name):
    query = f"select * from {schema_name}.{table_name}"

    with adbc_driver_snowflake.dbapi.connect(
        uri = snowflake_uri,
        db_kwargs = {"adbc.snowflake.sql.client_option.use_high_precision": "false"}
    ) as conn:
        with conn.cursor() as cursor:
            print(f"Querying {schema_name}.{table_name}")
            cursor.execute(query)  
            try:
                Table = cursor.fetch_arrow_table()
            except Exception as e:
                print(f"{schema_name}.{table_name} got exception {e}")

The function above fails after fetching a few tables. I've tried wrapping it in a for loop and also in a multiprocess Pool. Both don't succeed.

@lidavidm
Copy link
Member

About how large is the dataset you're trying to fetch?

The ADBC driver tries to buffer the parts of the dataset concurrently. You could try setting the options to limit the queue size and concurrency to cut down on memory usage. (We could/should also probably limit the overall buffer size based on memory usage, I suspect.) https://arrow.apache.org/adbc/current/driver/snowflake.html#performance

@lidavidm
Copy link
Member

"fails after fetching a few tables"

The other thing I would suspect here is if we somehow are keeping a reference to the dataset when we should not be...

CC @zeroshade for any ideas too

@aab200
Copy link
Author

aab200 commented Nov 10, 2023

10 million rows , 166 MB( In snowflake ) , 19 columns

@lidavidm lidavidm added this to the ADBC Libraries 0.9.0 milestone Nov 10, 2023
@bascheibler
Copy link

bascheibler commented Nov 10, 2023

"fails after fetching a few tables"

The other thing I would suspect here is if we somehow are keeping a reference to the dataset when we should not be...

CC @zeroshade for any ideas too

Actually, it has just failed on the 1st or 2nd execution of the pool. And I can see that this time it was probably cursor.execute() that failed.

The order of tables that are read differ slightly on each run, so it might be really associated to the table size. I'll try setting the queue size and concurrency and will keep you posted. Thanks.

edit: here's the Go output, in addition to the Exception above:

panic: close of nil channel

goroutine 1720 [running]:
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader.func4()
        /adbc/go/adbc/driver/snowflake/record_reader.go:572 +0x85
created by github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader
        /adbc/go/adbc/driver/snowflake/record_reader.go:568 +0xbc8

@lidavidm
Copy link
Member

Ok. The Go traceback indicates an actual bug that needs fixing.

The dataset described doesn't sound big enough to be worthy of causing memory issues, either. (Unless it's very highly compressed while in Snowflake.) So something seems off.

@lidavidm
Copy link
Member

goroutine 1720 [running]:
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader.func4()
        /adbc/go/adbc/driver/snowflake/record_reader.go:572 +0x85
created by github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader
        /adbc/go/adbc/driver/snowflake/record_reader.go:568 +0xbc8

@zeroshade from looking at the code, it seems a post-0.8.0 refactor should already have fixed this condition, do you agree?

@zeroshade
Copy link
Member

zeroshade commented Nov 13, 2023

Sorry for the delay in me taking a look at this...

@zeroshade from looking at the code, it seems a post-0.8.0 refactor should already have fixed this condition, do you agree?

Looking at the code I think I'd agree with you. @aab200, @bascheibler are either of you able to reproduce this error/failure if you build from master?

@bascheibler
Copy link

I've built from master (it installed adbc_driver_snowflake version '0.0.0+gff222e20' following the instructions here) and I'm still getting some errors, as the examples listed below:

panic: close of nil channel

goroutine 232 [running]:
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader.func4()
        /adbc/go/adbc/driver/snowflake/record_reader.go:572 +0x85
created by github.com/apache/arrow-adbc/go/adbc/driver/snowflake.newRecordReader
        /adbc/go/adbc/driver/snowflake/record_reader.go:568 +0xbc8

or (with more info printed from traceback.format_exc())

Traceback (most recent call last):
  File "/app/snowflake.py", line 41, in export_table
    Table = cursor.fetch_arrow_table()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/dbapi.py", line 1032, in fetch_arrow_table
    return self._results.fetch_arrow_table()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/arrow-adbc/python/adbc_driver_manager/adbc_driver_manager/dbapi.py", line 1135, in fetch_arrow_table
    return self._reader.read_all()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "adbc_driver_manager/_reader.pyx", line 90, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_all
  File "adbc_driver_manager/_reader.pyx", line 40, in adbc_driver_manager._reader._AdbcErrorHelper.check_error
adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory

The tables that failed have 2.2M, 98K and 13.3K rows with 48.0MB, 41.9MB and 229.5KB, respectively.

Python packages:

Please let me know if there's anything else I could do to help debugging this. I'd be happy to contribute.

@zeroshade
Copy link
Member

@bascheibler could you provide any kind of equivalent dataset via parquet/csv/otherwise that I could easily load into my own snowflake instance to attempt to reproduce this?

@bascheibler
Copy link

The ADBC driver tries to buffer the parts of the dataset concurrently. You could try setting the options to limit the queue size and concurrency to cut down on memory usage. (We could/should also probably limit the overall buffer size based on memory usage, I suspect.) https://arrow.apache.org/adbc/current/driver/snowflake.html#performance

These AdbcStatement options really do seem to indicate it's a performance issue. I've set prefetch_concurrency to 1 and it significantly increased the number of tables that fail (~80%). Before that, only approx. 25% were failing. If I set this param to 75 and result_queue_size to 500, on the other side, the error rate is reduced to between 5% - 10% (I'll keep playing with these parameters to check if I can get to 0%).

The refactored code is detailed below. I had to use a lower-level code to be able to set the statement options.

def export_table_low_level(schema_name, table_name):
    logging.debug(f"Starting download of {schema_name}.{table_name}")
    query = f"select * from {schema_name}.{table_name}"

    with adbc_driver_snowflake.connect(
        uri = snowflake_uri,
        db_kwargs = {
            "adbc.snowflake.sql.client_option.use_high_precision": "false"
        }
    ) as db:
        with adbc_driver_manager.AdbcConnection(db) as conn:
            with adbc_driver_manager.AdbcStatement(conn) as stmt:
                stmt.set_options(
                    **{
                        adbc_driver_snowflake.StatementOptions.PREFETCH_CONCURRENCY.value: "1"
                    }
                )
                stmt.set_sql_query(query)
                stream, _ = stmt.execute_query()
                reader = pyarrow.RecordBatchReader._import_from_c(stream.address)
                Table = reader.read_all()

@lidavidm
Copy link
Member

Hmm, so increasing the concurrency/aggressiveness is what reduces it...? That doesn't sound like an out-of-memory issue then

@zeroshade
Copy link
Member

maybe timeouts or something on the snowflake side? yea, this would be easiest if we could have a dataset that can reproduce this. I'll try tomorrow to see if i can generate one unless @bascheibler is able to provide something roughly equivalent to what he's using (I assume the actual data you're using is likely not able to be released publicly) that is able to trigger the errors.

@bascheibler
Copy link

bascheibler commented Nov 14, 2023

Here's a repo with both the dataset and the scripts to reproduce the issue: https://github.com/bascheibler/arrow-adbc-issue-1283

I've created only 3 test samples with hashed data off of some of the tables that failed. Unfortunately, I wasn't able to test this new dataset because the adbc_ingest function is either not committing or its payload is too large (please see my comment in the main.py file). The tables that get created have 0 rows in Snowflake. However, please feel free to use these parquet files as work best for you.

Please let me know if this data is enough to reproduce the issues. We may need to force a longer loop with repeated tables in order to get an error. Not sure how it behaves with a shorter list of tables.

@shollington-rbi
Copy link

We are also seeing this issue sporadically too when executing a fetch_arrow_table() on a AWS Lambda Python script when connecting to Snowflake. @bascheibler - were you able to find a combination of parameter configuration that minimizes the frqeuncy?

@zeroshade
Copy link
Member

By setting the prefetch concurrency option to "1" i was able to reproduce the close of nil channel error with a pure go mainprog. I'm going to dig in to try to determine a solution

@zeroshade
Copy link
Member

sigh As with any complex system, it came down to a race condition. @shollington-rbi @bascheibler @aab200 could any of you try pulling down the branch from the linked PR and verifying it fixes the problem for you?

@bascheibler
Copy link

I believe this branch hasn't totally fixed the issue yet. It is still returning the panic: close of nil channel error.
My repo has been fixed and now it's able to reproduce the issue end-to-end. To test it from the race-conditions-suck branch, just make sure that the REPRODUCE_ISSUE build arg is set to false.

@zeroshade
Copy link
Member

I'll try to use the updated repo scripts and see if i can reproduce it myself and figure out what I'm missing

@zeroshade
Copy link
Member

I don't think your dockerfile is picking up the updated version of the lib from my branch when it runs. I was able to reproduce the close of nil channel error, but the stack trace matches the v0.8.0 version of the lib, not the updated version from my branch with the fix. I'm gonna see if i can figure out how to update the Dockerfile so that it properly picks it up.

@bascheibler
Copy link

OK, thanks for checking that. I've added that pip freeze step in Dockerfile just to make sure it wasn't installing the 0.8.0 version, but I guess I'm missing something.

FYI: in the meantime, I've switched to snowflake-connector-python (version 3.5.0) and it's working fine so far. It's based on the nanoarrow project, right? Just checking.

@lidavidm
Copy link
Member

It is based on nanoarrow, but is unrelated to anything in ADBC.

@zeroshade
Copy link
Member

So after getting it to run with the updated build, I now got the failure to allocate memory error that was originally filed. I haven't been able to reproduce that with pure Go yet. So I'm not calling this solved just yet until I can determine the cause of that allocation error.

@zeroshade
Copy link
Member

So I managed to reproduce the "cannot allocate memory" error locally and interactive. With a bit of manipulation I was able to get stack traces and look closer at the situation. The "cannot allocate memory" error is being directly returned by the call to calloc in Mallocator.Allocate. The request for the allocation that it fails on seems to be somewhere between 800KB and a bit over 1MB, so it's not failing due to an invalid (0 or negative) size for allocation. The failure is also happening while i still have more than 10 - 12 GB of RAM available to me.

I'm only able to reproduce this issue via the python script in the repo provided by @bascheibler, I haven't been able to create a reproducer with just Go.

@zeroshade
Copy link
Member

Either way, I don't think this is related to the race condition, so I think the race condition should be fixed on its own and then we can look into the "cannot allocate memory" separately. I'll keep digging into this but it just seems REALLY strange to me

@lidavidm
Copy link
Member

What were both of the parameters for calloc?

@zeroshade
Copy link
Member

first argument somewhere between 800000 and 1200000, second argument is 1

@lidavidm
Copy link
Member

@zeroshade did that PR resolve both issues?

@zeroshade
Copy link
Member

Gah, crap. No. It didn't resolve the unable to allocate issue. I forgot to update it before merging. My mistake

@zeroshade
Copy link
Member

apache/arrow#40902 provides an upstream fix for this issue

@lidavidm lidavidm added this to the ADBC Libraries 1.0.0 milestone Mar 29, 2024
lidavidm pushed a commit that referenced this issue Apr 1, 2024
Fixes #1283 by incorporating the upstream fix
@ir3456
Copy link

ir3456 commented Apr 23, 2024

I am still seeing this issue when running on machines running Ubuntu. No problem running on Mac.

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py:1089, in Cursor.fetch_arrow_table(self)
   1084 if self._results is None:
   1085     raise ProgrammingError(
   1086         "Cannot fetch_arrow_table() before execute()",
   1087         status_code=_lib.AdbcStatusCode.INVALID_STATE,
   1088     )
-> 1089 return self._results.fetch_arrow_table()

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/dbapi.py:1198, in _RowIterator.fetch_arrow_table(self)
   1197 def fetch_arrow_table(self) -> pyarrow.Table:
-> 1198     return _blocking_call(self._reader.read_all, (), {}, self._stmt.cancel)

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/_lib.pyx:1552, in adbc_driver_manager._lib._blocking_call_impl()

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/_lib.pyx:1545, in adbc_driver_manager._lib._blocking_call_impl()

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/_reader.pyx:90, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_all()

File ~/isaac/venv/lib/python3.10/site-packages/adbc_driver_manager/_reader.pyx:40, in adbc_driver_manager._reader._AdbcErrorHelper.check_error()

OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory

In [3]: exit
(venv) dev@updated-backtest-5:~/isaac$ pip freeze | grep arrow
pyarrow==16.0.0
(venv) dev@updated-backtest-5:~/isaac$ pip freeze | grep adbc
adbc-driver-manager==0.11.0
adbc-driver-snowflake==0.11.0

I have the latest versions of pyarrow, adbc-driver-manager, and adbc-driver-snowflake. Is there something else I should be updating?

@lidavidm
Copy link
Member

Oops, I think 0.11.0 didn't actually incorporate the fix (the release happened right before the fix got in).

There are nightly wheels here, do you want to try them? (Download the artifact "python-amd64-manylinux2014", extract it, and manually pip install path/to/wheel.whl) https://github.com/apache/arrow-adbc/actions/runs/8792621756

@ir3456
Copy link

ir3456 commented Apr 24, 2024

Thanks for the help! That is working as expected and fixes the bug we were seeing. Any idea when we can expect this to be released?

@lidavidm
Copy link
Member

I plan to kick off the release in the next week or two, actually. That will be called "1.0" because I believe the Snowflake driver (and a few others) are generally ready. (Still deciding whether I will spend the time to try to decouple our version numbers.)

So TL;DR by mid-May, if all goes well.

@ir3456
Copy link

ir3456 commented Apr 24, 2024

Is there any chance that there will be a patch release 0.11.1 that we will be able to use before the 1.0?

@lidavidm
Copy link
Member

For us all releases take an equal amount of effort so it would functionally be the same as waiting for the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants