Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for load_table_from_csv #384

Open
jbvaningen opened this issue Jan 29, 2025 · 0 comments
Open

Support for load_table_from_csv #384

jbvaningen opened this issue Jan 29, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@jbvaningen
Copy link

jbvaningen commented Jan 29, 2025

What would you like to be added?

Hi!

This is a nice project and I'm quite excited to use it for enhancing the integration test suite of our data platform. I'm running into an issue with uploading data from JSON, and am hoping support could be added for it.

The bug has similarities to these other issues:

Here is a code snippet to reproduce:

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery

client = bigquery.Client(
    project="my_project",
    client_options=ClientOptions(api_endpoint="http://bigquery-emulator:9050"),
    credentials=AnonymousCredentials(),
)

# This works!
insert_job = client.query(
    query="INSERT INTO my_project.my_dataset.my_table (int_column) VALUES (1), (2);"
)
insert_job.result()

# This does not work
load_job = client.load_table_from_json(
    json_rows=[
        {"int_column": 3},
        {"int_column": 4},
    ],
    destination="my_project.my_dataset.my_table",
)
load_job.result()

The emulator is running in Docker Compose, here is the service configuration:

  bigquery-emulator:
    container_name: bigquery-emulator
    image: ghcr.io/goccy/bigquery-emulator:latest
    platform: linux/x86_64  # no native version exists for ARM-based Mac
    volumes:
      - ./bigquery-emulator-testdata.yaml:/bigquery-emulator-testdata.yaml
    command: "--project my_project --data-from-yaml=/bigquery-emulator-testdata.yaml --log-level=debug"
    ports:
      - "9050:9050"

And the contents of bigquery-emulator-testdata.yaml:

projects:
- datasets:
  - id: my_dataset
    tables: 
    - columns:
      - name: int_column
        type: NUMERIC
      id: my_table
  id: my_project

This is the error that gets printed to the client call:

>>> load_job.result()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/site-packages/google/cloud/bigquery/job/base.py", line 966, in result
    self._begin(retry=retry, timeout=timeout)
  File "/usr/local/lib/python3.11/site-packages/google/cloud/bigquery/job/base.py", line 746, in _begin
    api_response = client._call_api(
                   ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/cloud/bigquery/client.py", line 837, in _call_api
    return call()
           ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 293, in retry_wrapped_func
    return retry_target(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 153, in retry_target
    _retry_error_helper(
  File "/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_base.py", line 212, in _retry_error_helper
    raise final_exc from source_exc
  File "/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
    result = target()
             ^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 POST http://bigquery-emulator:9050/bigquery/v2/projects/my_project/jobs?prettyPrint=false: unspecified job configuration query

And finally debug logs from the emulator:

2025-01-29 13:01:01 2025-01-29T12:01:01.425Z    INFO    server/middleware.go:63 GET /bigquery/v2/projects/my_project/datasets/my_dataset/tables/my_table    {"query": "prettyPrint=false"}
2025-01-29 13:01:01 2025-01-29T12:01:01.445Z    INFO    server/middleware.go:63 POST /upload/bigquery/v2/projects/my_project/jobs        {"query": "uploadType=multipart"}
2025-01-29 13:01:01 2025-01-29T12:01:01.935Z    INFO    server/middleware.go:63 POST /bigquery/v2/projects/my_project/jobs       {"query": "prettyPrint=false"}
2025-01-29 13:01:01 2025-01-29T12:01:01.950Z    ERROR   server/handler.go:1019  jobInternalError        {"error": "jobInternalError: unspecified job configuration query"}

Would be great if this feature can be supported!

@jbvaningen jbvaningen added the enhancement New feature or request label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant