Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_delta_log items missing createdTime field #2944

Closed
Petterhg opened this issue Oct 18, 2024 · 2 comments
Closed

_delta_log items missing createdTime field #2944

Petterhg opened this issue Oct 18, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Petterhg
Copy link

Environment

Delta-rs version:
delta-rs.0.20.0

Binding:

Environment:

  • Cloud provider: aws
  • OS: ubuntu/aws fargate
  • Other:

Bug

What happened:
I'm running the python client, with dynamodb as locking provider and s3 as storage. In order to catch when there is a schema change I have a logic that's like this (which then propagates the schema change to BigQuery that consumes the tables):

try:
        write_deltalake(
            f"{BUCKET}/{path}",
            data,
            schema=schema,
            mode="append",
            storage_options=storage_options,
            engine="rust",
            name=message.table_name,
            description=message.table_description,
            partition_by=message.partition_by,
        )
    except SchemaMismatchError as e:
        write_deltalake(
            f"{BUCKET}/{path}",
            data,
            schema=schema,
            mode="append",
            storage_options=storage_options,
            schema_mode="merge",
            engine="rust",
            name=message.table_name,
            description=message.table_description,
            partition_by=message.partition_by,
        )

I THINK the error is happening when the engine fails to write the items. It then creates a log item in s3 with no creationTime like below which BigQuery is unable to parse and then the complete table is failing to load. I don't know if this is expected behaviour from the lib and it's just BigQuery that is not reading the metadata correctly or if this actually is a bug in the lib.

{
    "metaData": {
        "id": "e79c2746-c678-4686-adc7-5fc361f08334",
        "name": null,
        "description": null,
        "format": {
            "provider": "parquet",
            "options": {}
        },
        "schemaString": "{\"type\":\"struct\",\"fields\":[{\"name\":\"crawler_id\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"source_id\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"source_type\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"show_id\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"show_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"show_url\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_id\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_url\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"media_type\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_published_at\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_fetched_at\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"time_to_discovery_minutes\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_published_year\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_published_month\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_published_day\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"episode_audio_url\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"country\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"language\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}",
        "partitionColumns": [],
        "createdTime": null,
        "configuration": {}
    }
}
{
    "add": {
        "path": "part-00001-deae2bae-e35f-4504-a2cd-70407cc64328-c000.snappy.parquet",
        "partitionValues": {},
        "size": 9898,
        "modificationTime": 1729226114839,
        "dataChange": true,
        "stats": "{\"numRecords\":4,\"minValues\":{\"crawler_id\":\"987e918be3074bec8d1b797847e1f7be\",\"episode_published_at\":\"2024-10-18T08:00:00+02:00\",\"show_name\":\"Ekot nyhetssändning\",\"media_type\":\"audio\",\"episode_published_day\":18,\"show_url\":\"https://sverigesradio.se/ekot\",\"episode_fetched_at\":\"2024-10-18T06:32:37.174549+02:00\",\"episode_id\":\"2460304\",\"source_type\":\"radio\",\"time_to_discovery_minutes\":-117,\"episode_url\":\"https://sverigesradio.se/avsnitt/2460306\",\"source_id\":\"sverigesradio\",\"episode_published_year\":2024,\"episode_published_month\":10,\"episode_audio_url\":\"https://sverigesradio.se/topsy/ljudfil/9519131-hi\",\"episode_name\":\"Ekot 06:00 Terrormilisen Hizbollah meddelar att man trappar upp konflikten med Israel\",\"show_id\":\"sverigesradio_ekot_nyhetssandning\",\"country\":\"se\"},\"maxValues\":{\"time_to_discovery_minutes\":-84,\"show_url\":\"https://sverigesradio.se/ekotsenastenytt\",\"episode_published_year\":2024,\"episode_published_month\":10,\"country\":\"se\",\"show_id\":\"sverigesradio_ekot_senaste_nytt\",\"episode_url\":\"https://sverigesradio.se/avsnitt/ekot-0600-terrormilisen-hizbollah-meddelar-att-man-trappar-upp-konflikten-med-israel\",\"episode_id\":\"2465216\",\"crawler_id\":\"987e918be3074bec8d1b797847e1f7be\",\"media_type\":\"audio\",\"episode_published_at\":\"2024-10-18T08:30:00+02:00\",\"episode_audio_url\":\"https://sverigesradio.se/topsy/ljudfil/9519145-hi\",\"source_id\":\"sverigesradio\",\"episode_fetched_at\":\"2024-10-18T06:35:05.740971+02:00\",\"episode_name\":\"Ekot senaste nytt\",\"show_name\":\"Ekot senaste nytt\",\"episode_published_day\":18,\"source_type\":\"radio\"},\"nullCount\":{\"episode_published_month\":0,\"source_id\":0,\"time_to_discovery_minutes\":0,\"show_name\":0,\"country\":0,\"media_type\":0,\"show_id\":0,\"language\":4,\"episode_name\":0,\"source_type\":0,\"episode_published_at\":0,\"episode_fetched_at\":0,\"episode_published_year\":0,\"episode_audio_url\":0,\"episode_id\":0,\"crawler_id\":0,\"episode_url\":0,\"show_url\":0,\"episode_published_day\":0}}",
        "tags": null,
        "deletionVector": null,
        "baseRowId": null,
        "defaultRowCommitVersion": null,
        "clusteringProvider": null
    }
}
{
    "commitInfo": {
        "timestamp": 1729226114840,
        "operation": "WRITE",
        "operationParameters": {
            "mode": "Append"
        },
        "operationMetrics": {
            "execution_time_ms": 105,
            "num_added_files": 1,
            "num_added_rows": 4,
            "num_partitions": 0,
            "num_removed_files": 0
        },
        "clientVersion": "delta-rs.0.20.0"
    }
}

I CAN read the table when using the Python client, just not with BigQuery. Any advice here would be highly appreciated.

What you expected to happen:

How to reproduce it:

More details:

@Petterhg Petterhg added the bug Something isn't working label Oct 18, 2024
@ion-elgreco
Copy link
Collaborator

It's an optional field so that's something google needs to address in bigquery

@rtyler
Copy link
Member

rtyler commented Oct 19, 2024

@Petterhg as @ion-elgreco mentioned createdTime is optional but unfortunately some delta implementations do not treat optional as actually optional 😭

#2926 does ensure that the createdTime gets set when a new metadata action is created, such as on schema evolution, so perhaps once I release 0.20.2 you would be able to modify the schema of that table or recreate it in order to make it BigQuery readable?

@delta-io delta-io locked and limited conversation to collaborators Oct 19, 2024
@rtyler rtyler converted this issue into discussion #2948 Oct 19, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants