`scan_csv()` in container fails with disk space error (e.g. AWS lambda, or container) #17946

GBMsejimenez · 2024-07-30T19:38:20Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import json
import boto3
import polars as pl

session = boto3.Session()
CREDENTIALS = session.get_credentials()
STORAGE_OPTIONS = {
    "aws_region": "us-east-1",
    "aws_access_key_id": CREDENTIALS.access_key,
    "aws_secret_access_key": CREDENTIALS.secret_key,
}
if CREDENTIALS.token:
    STORAGE_OPTIONS.update({"session_token": CREDENTIALS.token})

print(STORAGE_OPTIONS)

# Define the schema for reading the CSV file
SCHEMA = {
    "user_id": pl.Int32,
    "transaction_date": pl.Datetime,
    "order_id": pl.Int32,
    "price": pl.Float32,
    "quantity": pl.Int16,
    "item_id": pl.Int32,
    "item_desc": pl.Utf8,
}


def read_s3(uri: str) -> pl.LazyFrame:
    """
    Read a CSV file from S3 using Polars.

    :param uri: S3 URI of the CSV file.
    :return: Polars LazyFrame with the CSV data.
    """
    return pl.scan_csv(
        uri,
        schema_overrides=SCHEMA,
        ignore_errors=True,
        truncate_ragged_lines=True,
        storage_options=STORAGE_OPTIONS,
    )


def apply_rfm(df: pl.LazyFrame) -> pl.LazyFrame:
    """
    Calculate RFM scores for each user and segment them.

    :param df: Input dataframe.
    :return: Dataframe with RFM scores and segments.
    """

    df_rfm = df.group_by("user_id").agg(
        recency=pl.col("transaction_date").max(),  # Most recent transaction date
        frequency=pl.col("order_id").n_unique(),  # Number of unique orders
        monetary=pl.col("total_amount_plus_taxes").sum(),  # Total monetary value
    )
    latest_date = df.select(pl.col("transaction_date").max()).collect().item()
    df_rfm = df_rfm.with_columns(
        recency=(
            latest_date - pl.col("recency")
        ).dt.total_days()  # Calculate recency in days
    )

    print("RFM Calculated")
    return df_rfm


def handler(event: dict, context: dict) -> dict:
    try:
        uri = event["Records"][0]["s3"]["uri"]

        df = read_s3(uri)

        df = apply_rfm(df)
        
        return {
            "statusCode": 200,
            "body": json.dumps("RFM loaded to DataBase"),
        }

    except Exception as e:
        print(f"Error in RFM process: {e}")
        return {"statusCode": 500, "body": json.dumps("Error in RFM process")}

Log output

INIT_REPORT Init Duration: 10008.73 ms	Phase: init	Status: timeout
Error in RFM process: failed to allocate 25954093 bytes to download uri = s3://aws-us-east-1-dev-s3-xxx/xxx/dataset_processed302e6eea-f9ed-4df4-8ad5-b7c8eada0658.csv
This error occurred with the following context stack:
[1] 'csv scan' failed
[2] 'filter' input failed to resolve
[3] 'filter' input failed to resolve
[4] 'select' input failed to resolve
END RequestId: 3c9f6613-f850-48e4-8658-1b47af8d8786
REPORT RequestId: 3c9f6613-f850-48e4-8658-1b47af8d8786	Duration: 26514.89 ms	Billed Duration: 26515 ms	Memory Size: 10240 MB	Max Memory Used: 159 MB

Issue description

I'm new to Polars and attempting to implement an RFM analysis using the library. As part of my proposed architecture, I need to run the code in an AWS Lambda function. I've successfully implemented the RFM analysis and uploaded the code to Lambda using a Docker image.

Despite the code running successfully on my local container, I'm encountering a "failed to allocate 25954093 bytes" error when running it in the Lambda function. I've tried to troubleshoot the issue, ruling out credential errors since the scan_csv function doesn't throw any errors, and explicitly passing AWS credentials to the scan_csv function.

Attempts to Resolve
I've attempted to apply solutions from issues #7774 and #1777, including:

Setting streaming=True on the collect method
Defining my schema columns as pl.utf8 or pl.int

Thanks in advanced 🤗

Expected behavior

The Polars code should work seamlessly in the Lambda function, just like it does on the local container, without any memory allocation errors.

Installed versions

--------Version info---------
Polars:               1.3.0
Index type:           UInt32
Platform:             Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.34
Python:               3.12.3 (main, Jun  5 2024, 03:37:09) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)]       

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                2.0.1
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              17.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

The text was updated successfully, but these errors were encountered:

ritchie46 · 2024-08-01T08:24:43Z

Can you set POLARS_PANIC_ON_ERR=1 and RUST_BACKTRACE=1 and show us the backtrace log?

wjglenn3 · 2024-08-10T03:31:01Z

Hi I'm not sure if I should start another issue for this, but I'm pretty sure I'm having the same issue. When running inside an AWS Lambda, I am able to read a CSV and write it to a Parquet file using read_csv and write_parquet, but not so much luck with scan_csv and sink_parquet. I'm getting the same type and error and have tried the same methods to solve the issue as @GBMsejimenez.

I've gotten the code down to the bare minimum for me to reproduce the error (the CSV file being tested only consists of a header and two lines of data, and the bucket and path in the file name have been edited out).

import polars as pl
import s3fs
import json

POLARS_PANIC_ON_ERR=1
RUST_BACKTRACE=1
 
# Lambda entry
def lambda_handler(event, context):
    
    pl.show_versions()
    
    csv_file = 's3://{BUCKET}/{PATH}/test.csv'
    #parquet_file = 's3://{BUCKET}/{PATH}/test.parquet'

    fs = s3fs.S3FileSystem(anon=False)

    df = pl.scan_csv(csv_file).collect(streaming=True)
    

    return {
        'statusCode': 200,
        'body': json.dumps("Finished")
    }

This is giving me an error of (with {BUCKET} and {PATH} having actual values)

[ERROR] ComputeError: failed to allocate 1343 bytes to download uri = s3://{BUCKET}/{PATH}/test.csv
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 40, in lambda_handler
    df = pl.scan_csv(csv_file).collect()
  File "/opt/python/polars/lazyframe/frame.py", line 2027, in collect
    return wrap_df(ldf.collect(callback))

My polars versions if necessary

--------Version info---------
Polars:               1.4.1
Index type:           UInt32
Platform:             Linux-5.10.219-229.866.amzn2.x86_64-x86_64-with-glibc2.26
Python:               3.11.6 (main, Feb  7 2024, 11:27:56) [GCC 7.3.1 20180712 (Red Hat 7.3.1-17)]
----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.6.1
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

qmg-tmay · 2024-08-20T09:32:37Z

@wjglenn3 I'm experiencing the same issue when using a Docker container based lambda

HectorPascual · 2024-09-13T12:22:37Z

Hey, we are experiencing the same issue within docker in AWS Lambda, we attempted all the combinations.

I also tried installing s3fs, which is needed for the read_csv, but also breaks with error :

ComputeError : failed to allocate 12345 bytes to download uri = s3://...

Here's my minimum example that breaks :

import asyncio

import boto3
import polars as pl
import uvloop

asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

session = boto3.session.Session(region_name="us-west-2")
credentials = session.get_credentials().get_frozen_credentials()
storage_options = {
    "aws_access_key_id": credentials.access_key,
    "aws_secret_access_key": credentials.secret_key,
    "aws_session_token": credentials.token,
    "aws_region": session.region_name,
}


async def do():
    df = pl.scan_csv(
        "s3://.../*.csv",  # example path
        storage_options=storage_options,
    ).collect()
    print(df)


def lambda_handler(event, context):
    uvloop.run(do())
    return "OK"

@alexander-beedie could you please be so kind to treat this issue?

Thank you for the efforts!

AnskeVan · 2024-11-22T10:55:54Z

Hi there,

Not sure that it matters, but I'm having the exact same issue here using docker image in AWS lambda, when collecting my lazyframe with execution plan. Hopefully the more people reporting running into this, the more the fix can be prioritized...

Lazyframe explain plan is

WITH_COLUMNS:
......
   SELECT 
   ...........
   FROM
     WITH_COLUMNS:
     [false.alias("monthly_export_origin")
	 , String(abfs://.../.../../filename.csv).alias("export_filename")
	 , String(2024-11-22T10:17:11.599+00:00).str.strptime([String(raise)]).alias("rec_inserted")] 
      Csv SCAN [abfs://.../.../../filename.csv]
      PROJECT 65/65 COLUMNS

I have cut out the select columns and some simple with_columns statements from this execution plan, and also the exact abfs path and filename, but it is trying to scan a csv file from an azure container. Code obviously runs fine locally, but not within lambda, with the exact same error message as described above:
failed to allocate 122128901 bytes to download uri = abfs://.../.../../filename.csv

Cheers!

HectorPascual · 2024-11-22T11:25:26Z

Hi there,

Not sure that it matters, but I'm having the exact same issue here using docker image in AWS lambda, when collecting my lazyframe with execution plan. Hopefully the more people reporting running into this, the more the fix can be prioritized...

Lazyframe explain plan is
WITH_COLUMNS:
......
   SELECT 
   ...........
   FROM
     WITH_COLUMNS:
     [false.alias("monthly_export_origin")
	 , String(abfs://.../.../../filename.csv).alias("export_filename")
	 , String(2024-11-22T10:17:11.599+00:00).str.strptime([String(raise)]).alias("rec_inserted")] 
      Csv SCAN [abfs://.../.../../filename.csv]
      PROJECT 65/65 COLUMNS
I have cut out the select columns and some simple with_columns statements from this execution plan, and also the exact abfs path and filename, but it is trying to scan a csv file from an azure container. Code obviously runs fine locally, but not within lambda, with the exact same error message as described above: failed to allocate 122128901 bytes to download uri = abfs://.../.../../filename.csv

Cheers!

Hi,

From my understanding and my attempts, there's a bug not allowing to use scan_csv inside Lambda's docker. Hopefully someone can give more context here.

AnskeVan · 2024-11-22T11:43:00Z

Hi there,
Not sure that it matters, but I'm having the exact same issue here using docker image in AWS lambda, when collecting my lazyframe with execution plan. Hopefully the more people reporting running into this, the more the fix can be prioritized...
Lazyframe explain plan is
WITH_COLUMNS:
......
   SELECT 
   ...........
   FROM
     WITH_COLUMNS:
     [false.alias("monthly_export_origin")
	 , String(abfs://.../.../../filename.csv).alias("export_filename")
	 , String(2024-11-22T10:17:11.599+00:00).str.strptime([String(raise)]).alias("rec_inserted")] 
      Csv SCAN [abfs://.../.../../filename.csv]
      PROJECT 65/65 COLUMNS
I have cut out the select columns and some simple with_columns statements from this execution plan, and also the exact abfs path and filename, but it is trying to scan a csv file from an azure container. Code obviously runs fine locally, but not within lambda, with the exact same error message as described above: failed to allocate 122128901 bytes to download uri = abfs://.../.../../filename.csv
Cheers!
Hi,

From my understanding and my attempts, there's a bug not allowing to use scan_csv inside Lambda's docker. Hopefully someone can give more context here.

It might be a bug on lambda's side instead of polars? I have premium support there, so I might create a ticket for AWS, to investigate from their side. Will let you know in this thread if and when anything comes out of that....

jerome-viveret-onfido · 2024-11-27T17:24:52Z

For me, the very same thing happens (memory allocation error) on collect_schema() when applied on a lazy frame. It is worth noting that it happens on scan_csv only, not on scan_parquet.

nameexhaustion · 2024-11-28T12:01:38Z

This error is due to insufficient disk space - we require enough disk space for the entire file to be downloaded for scan_csv. We may improve in the future with improvements to streaming functionality.

qmg-tmay · 2024-11-28T12:08:11Z

This error is due to insufficient disk space - we require enough disk space for the entire file to be downloaded for scan_csv. We may improve in the future with improvements to streaming functionality.

I've encountered this issue when using a test file of only a few KB. The issue still occurs even if there is plenty of available disk space in the lambda runtime

HectorPascual · 2024-11-28T14:11:52Z

This error is due to insufficient disk space - we require enough disk space for the entire file to be downloaded for scan_csv. We may improve in the future with improvements to streaming functionality.

I've encountered this issue when using a test file of only a few KB. The issue still occurs even if there is plenty of available disk space in the lambda runtime

That's correct, I've experienced the same issue with few KBs using scan_csv, read_csv function even works.

jerome-viveret-onfido · 2024-11-28T14:41:23Z

Could it have to do with the system primitives that are used to determine the temporary directory and how it could interact with the lambda ephemeral storage ?

HectorPascual · 2024-11-28T14:49:30Z

Could it have to do with the system primitives that are used to determine the temporary directory and how it could interact with the lambda ephemeral storage ?

That could make sense, some kind of permission issue or small size disk partition for temp path. It would be interesting to see where Polars attempts to download the file, since we specify no path for it on the lazy scans.

AnskeVan · 2024-11-29T08:28:07Z

This error is due to insufficient disk space - we require enough disk space for the entire file to be downloaded for scan_csv. We may improve in the future with improvements to streaming functionality.

I also really don't think it is due to insufficient disk space, the minimal example I created for aws support fails to scan a tiny csv file less than 7KB large. The memory it is trying to allocate is 6750 bytes, with ephemeral storage of 512MB being allocated. jerome-viveret-onfidos comment makes more sense. AWS is looking into this as well.

nameexhaustion · 2024-12-03T07:37:58Z

For debugging, setting the POLARS_VERBOSE=1 environment variable will print the path of the temporary directory.

It can be changed to a mount point with more storage by setting the POLARS_TEMP_DIR environment variable.

AnskeVan · 2024-12-03T11:11:03Z

Using POLARS_VERBOSE, I can see it is trying to write to: /tmp/polars/file-cache/.
If I change the path (with POLARS_TEMP_DIR) to just /tmp, I still get the error: failed to create temporary directory: path = '/tmp', err = Read-only file system (os error 30)
The problem here is indeed specifically using Lambda with Docker. Just running in Lambda the ephemeral storage would probably work even without changing the default tmp dir path, but. I don't think you can use the actual ephemeral store from a Docker container in Lambda. In accordance with this, the AWS docs literally state the following:
The container image must be able to run on a read-only file system. Your function code can access a writable /tmp directory with between 512 MB and 10,240 MB, in 1-MB increments, of storage.
Being able to run on a read-only file system, this is clearly not the case when you use scan_csv.
Docker containers based on Linux based images would have a folder called /tmp but that can only accessed to read from, not to write to.

HectorPascual · 2024-12-12T18:28:59Z

I went through the issue once again with a colleague. I'll leave down here some conclussions, might be a reference for future investigations or ideas about the issue :

The error happens due to the file caching lazy api does, file has to be downloaded at least once (even if caching is disable from scan_csv options)
The error is happening here :

polars/crates/polars-io/src/file_cache/entry.rs

Line 157 in f599e88

ComputeError: "failed to allocate {} bytes to download uri = {}",
AWS Lambda when running in docker can only work over a read-only file system, except for the /tmp (AWS Lambda in Docker supports writing to /tmp), so if we set POLARS_TEMP_DIR env var to /tmp we should be good, it's unfortunately not the case, still failing. Similar issue here : https://stackoverflow.com/questions/79112552/aws-lambda-docker-container-write-file, actually someone in the comments mention it, see below :
The path for the file cache is defined here :

polars/crates/polars-io/src/file_cache/utils.rs

Line 17 in f599e88

.join("file-cache/")

My current hypothesis for the issue is that since the cached files go under /tmp/file-cache , not directly to /tmp, AWS Lambda fails to write to the ephemeral storage. It would be interesting to modify this path in the Rust source to directly write to /tmp instead of a subdirectory and see if it works.

HectorPascual · 2024-12-12T18:41:26Z

In addition to above, I tried running the following chunk of code inside Docker in AWS Lambda, thinking that it would fail, but worked. So my hypothesis about writing on subfolders in /tmp not working is not valid anymore... 🤔

import os

def handler(event: Any, context: Any):
    os.makedirs("/tmp/a")
    with open("/tmp/a/a.txt", "w") as f:
        f.write("a")

    with open("/tmp/a/a.txt", "r") as f:
        print(f.read())

Maybe it's just the allocate operation that fails, and the write wouldn't? This is the system call that polars uses to allocate size for the file to be written : https://wasix.org/docs/api-reference/wasi/fd_allocate

HectorPascual · 2024-12-18T18:33:58Z

Please adjust the title of the issue since it's not matching properly.

The issue is about running Polars (scan_csv) inside Docker Container in AWS Lambda. And scan_csv fails even if there is disk space.

Thanks in advance!

nameexhaustion · 2024-12-19T13:05:46Z

@HectorPascual , I have added an environment flag that will be available in the next release that can be set, POLARS_IGNORE_FILE_CACHE_ALLOCATE_ERROR=1. Could you give it a try (also with setting POLARS_VERBOSE=1) to see if it helps?

mattyellen · 2025-01-18T00:39:15Z

I also just ran into this bug. In my environment the Lambda is already writing to /tmp and as others have reported it is failing with very small allocations. This suggests it's not a disk space issue, nor a read-only filesystem.

Furthermore as @HectorPascual noted the failure is happening with this function call: file.allocate(remote_metadata.size). So it's not just trying to write to disk, it's trying to execute an fd_allocate syscall. The specific error being returned is:

Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }

This suggests AWS Lambda (and perhaps Docker in general) is preventing this syscall. To test this I attempted the same operation in Python:

    import os

    fd = os.open("/tmp/my_file.txt", os.O_RDWR | os.O_CREAT)
    try:
        logger.info(f'Allocating 1024 bytes from the beginning of the file')
        os.posix_fallocate(fd, 0, 1024)
    finally:
        os.close(fd)

And sure enough, it fails with a similar error:

  "errorMessage": "[Errno 1] Operation not permitted",
  "errorType": "PermissionError",

HectorPascual · 2025-01-18T12:05:17Z

I also just ran into this bug. In my environment the Lambda is already writing to /tmp and as others have reported it is failing with very small allocations. This suggests it's not a disk space issue, nor a read-only filesystem.

Furthermore as @HectorPascual noted the failure is happening with this function call: file.allocate(remote_metadata.size). So it's not just trying to write to disk, it's trying to execute an fd_allocate syscall. The specific error being returned is:
Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }
This suggests AWS Lambda (and perhaps Docker in general) is preventing this syscall. To test this I attempted the same operation in Python:
    import os

    fd = os.open("/tmp/my_file.txt", os.O_RDWR | os.O_CREAT)
    try:
        logger.info(f'Allocating 1024 bytes from the beginning of the file')
        os.posix_fallocate(fd, 0, 1024)
    finally:
        os.close(fd)
And sure enough, it fails with a similar error:
  "errorMessage": "[Errno 1] Operation not permitted",
  "errorType": "PermissionError",

Hey, very good example for reproducing the error, was it run in AWS Lambda too?

You can try setting the flag @nameexhaustion mentioned and see if the result changes : POLARS_IGNORE_FILE_CACHE_ALLOCATE_ERROR=1, I wasn't able to check it yet, will check asap.

mattyellen · 2025-01-21T16:30:36Z

I did try setting that POLARS_IGNORE_FILE_CACHE_ALLOCATE_ERROR environment variable. It seemed to get past the first check but then failed later on. I could collect more information if necessary, but it looks like we may already have a fix.

Ayusharma0698 · 2025-01-27T15:01:21Z

I am having the same issue
polars.exceptions.ComputeError: failed to allocate 4044667404 bytes to download uri.
my scan csv function looks something like this

pl.scan_csv(s3_path, storage_options={
            "aws_access_key_id": credentials.access_key,
            "aws_secret_access_key": credentials.secret_key,
            "region": os.environ["REGION"],
            "session_token": credentials.token,
        }, infer_schema_length=10000,
                          ignore_errors=True,
                          truncate_ragged_lines=True,
                          skip_rows=skip_first_rows, separator=delimiter,
                          has_header=with_header, glob=False, encoding="utf8-lossy")

the csv file is approximately 3.8GB.
Any suggestions would be appreciated.

AnskeVan · 2025-01-27T15:13:27Z

I am having the same issue polars.exceptions.ComputeError: failed to allocate 4044667404 bytes to download uri. my scan csv function looks something like this

pl.scan_csv(s3_path, storage_options={
            "aws_access_key_id": credentials.access_key,
            "aws_secret_access_key": credentials.secret_key,
            "region": os.environ["REGION"],
            "session_token": credentials.token,
        }, infer_schema_length=10000,
                          ignore_errors=True,
                          truncate_ragged_lines=True,
                          skip_rows=skip_first_rows, separator=delimiter,
                          has_header=with_header, glob=False, encoding="utf8-lossy")

the csv file is approximately 3.8GB. Any suggestions would be appreciated.

Read the full thread :-), so you see it is fixed in #20796. The fix made it into the Python Polars 1.21.0 release, so just upgrade your polars version.

ritchie46 · 2025-01-27T16:36:10Z

@nameexhaustion can we close this one now?

nameexhaustion · 2025-01-28T04:45:43Z

Closed as completed via #20796

GBMsejimenez added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jul 30, 2024

nameexhaustion added enhancement New feature or an improvement of an existing feature new-streaming Features for or dependent on the new streaming engine and removed needs triage Awaiting prioritization by a maintainer labels Nov 28, 2024

nameexhaustion changed the title ~~Groupby using lazy mode on a csv throw an memory allocation error when running on AWS lambda~~ scan_csv() fails if there is not enough disk space to download the whole file (e.g. AWS lambda, or container) Nov 28, 2024

nameexhaustion removed the bug Something isn't working label Nov 28, 2024

nameexhaustion mentioned this issue Dec 19, 2024

feat: Add env var to ignore file cache allocate error #20356

Merged

nameexhaustion changed the title ~~scan_csv() fails if there is not enough disk space to download the whole file (e.g. AWS lambda, or container)~~ scan_csv() in container fails with disk space error (e.g. AWS lambda, or container) Dec 19, 2024

nameexhaustion added bug Something isn't working accepted Ready for implementation and removed enhancement New feature or an improvement of an existing feature new-streaming Features for or dependent on the new streaming engine labels Jan 19, 2025

github-project-automation bot added this to Backlog Jan 19, 2025

github-project-automation bot moved this to Ready in Backlog Jan 19, 2025

nameexhaustion self-assigned this Jan 19, 2025

nameexhaustion mentioned this issue Jan 20, 2025

fix: Ignore file cache allocation error if fallocate() is not permitted #20796

Merged

nameexhaustion closed this as completed Jan 28, 2025

github-project-automation bot moved this from Ready to Done in Backlog Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`scan_csv()` in container fails with disk space error (e.g. AWS lambda, or container) #17946

`scan_csv()` in container fails with disk space error (e.g. AWS lambda, or container) #17946

GBMsejimenez commented Jul 30, 2024

ritchie46 commented Aug 1, 2024

wjglenn3 commented Aug 10, 2024

qmg-tmay commented Aug 20, 2024

HectorPascual commented Sep 13, 2024 •

edited

Loading

AnskeVan commented Nov 22, 2024

HectorPascual commented Nov 22, 2024

AnskeVan commented Nov 22, 2024

jerome-viveret-onfido commented Nov 27, 2024

nameexhaustion commented Nov 28, 2024

qmg-tmay commented Nov 28, 2024

HectorPascual commented Nov 28, 2024

jerome-viveret-onfido commented Nov 28, 2024 •

edited

Loading

HectorPascual commented Nov 28, 2024

AnskeVan commented Nov 29, 2024

nameexhaustion commented Dec 3, 2024

AnskeVan commented Dec 3, 2024 •

edited

Loading

HectorPascual commented Dec 12, 2024

HectorPascual commented Dec 12, 2024 •

edited

Loading

HectorPascual commented Dec 18, 2024 •

edited

Loading

nameexhaustion commented Dec 19, 2024

mattyellen commented Jan 18, 2025

HectorPascual commented Jan 18, 2025

mattyellen commented Jan 21, 2025

Ayusharma0698 commented Jan 27, 2025

AnskeVan commented Jan 27, 2025

ritchie46 commented Jan 27, 2025

nameexhaustion commented Jan 28, 2025

scan_csv() in container fails with disk space error (e.g. AWS lambda, or container) #17946

scan_csv() in container fails with disk space error (e.g. AWS lambda, or container) #17946

Comments

GBMsejimenez commented Jul 30, 2024

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

ritchie46 commented Aug 1, 2024

wjglenn3 commented Aug 10, 2024

qmg-tmay commented Aug 20, 2024

HectorPascual commented Sep 13, 2024 • edited Loading

AnskeVan commented Nov 22, 2024

HectorPascual commented Nov 22, 2024

AnskeVan commented Nov 22, 2024

jerome-viveret-onfido commented Nov 27, 2024

nameexhaustion commented Nov 28, 2024

qmg-tmay commented Nov 28, 2024

HectorPascual commented Nov 28, 2024

jerome-viveret-onfido commented Nov 28, 2024 • edited Loading

HectorPascual commented Nov 28, 2024

AnskeVan commented Nov 29, 2024

nameexhaustion commented Dec 3, 2024

AnskeVan commented Dec 3, 2024 • edited Loading

HectorPascual commented Dec 12, 2024

HectorPascual commented Dec 12, 2024 • edited Loading

HectorPascual commented Dec 18, 2024 • edited Loading

nameexhaustion commented Dec 19, 2024

mattyellen commented Jan 18, 2025

HectorPascual commented Jan 18, 2025

mattyellen commented Jan 21, 2025

Ayusharma0698 commented Jan 27, 2025

AnskeVan commented Jan 27, 2025

ritchie46 commented Jan 27, 2025

nameexhaustion commented Jan 28, 2025

`scan_csv()` in container fails with disk space error (e.g. AWS lambda, or container) #17946

`scan_csv()` in container fails with disk space error (e.g. AWS lambda, or container) #17946

HectorPascual commented Sep 13, 2024 •

edited

Loading

jerome-viveret-onfido commented Nov 28, 2024 •

edited

Loading

AnskeVan commented Dec 3, 2024 •

edited

Loading

HectorPascual commented Dec 12, 2024 •

edited

Loading

HectorPascual commented Dec 18, 2024 •

edited

Loading