Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(c/driver/postgresql): Support JSON and JSONB types #2072

Merged
merged 2 commits into from
Aug 12, 2024

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Aug 12, 2024

This PR adds support for JSON and JSONB types. Before this PR, the raw COPY was returned: for JSON, this would have been the bytes of a the JSON string. For JSONB, this would have been the bytes of the JSON string prefixed by 0x01, which is the one and only version number of tthe JSONB COPY binary format.

This PR routes it through the string type. We could also route this through the JSON canonical extension type by adding some metadata. I don't think an implementation of that type exists anywhere yet (but it might functionally be the same since pyarrow/Arrow C++ would just drop the extension metadata).

The testing here is a bit repetitive...we could definitely improve on reducing duplication in the test cases 😬 .

Closes #2068.

Reproducer in R:

library(adbcdrivermanager)
#> Warning: package 'adbcdrivermanager' was built under R version 4.3.3

con <- adbc_database_init(
  adbcpostgresql::adbcpostgresql(), 
  uri = "postgresql://localhost:5432/postgres?user=postgres&password=password"
) |> 
  adbc_connection_init()

lots_of_json_url <- "https://github.com/apache/arrow-experiments/raw/main/data/arrow-commits/arrow-commits.jsonl"
lines <- readLines(lots_of_json_url)

con |> 
  execute_adbc("DROP TABLE IF EXISTS much_json")

data.frame(lines = lines) |> 
  write_adbc(con, "much_json")

con |> 
  read_adbc("select lines::jsonb as lines from much_json") |> 
  tibble::as_tibble()
#> # A tibble: 15,487 × 1
#>    lines                                                                        
#>    <chr>                                                                        
#>  1 "{\"time\": \"2024-03-07T02:00:52\", \"files\": 2, \"merge\": false, \"commi…
#>  2 "{\"time\": \"2024-03-06T21:51:34\", \"files\": 1, \"merge\": false, \"commi…
#>  3 "{\"time\": \"2024-03-06T20:29:15\", \"files\": 1, \"merge\": false, \"commi…
#>  4 "{\"time\": \"2024-03-06T07:46:45\", \"files\": 1, \"merge\": false, \"commi…
#>  5 "{\"time\": \"2024-03-05T16:13:32\", \"files\": 1, \"merge\": false, \"commi…
#>  6 "{\"time\": \"2024-03-05T14:53:13\", \"files\": 20, \"merge\": false, \"comm…
#>  7 "{\"time\": \"2024-03-05T12:31:38\", \"files\": 2, \"merge\": false, \"commi…
#>  8 "{\"time\": \"2024-03-05T08:15:42\", \"files\": 6, \"merge\": false, \"commi…
#>  9 "{\"time\": \"2024-03-05T07:56:25\", \"files\": 2, \"merge\": false, \"commi…
#> 10 "{\"time\": \"2024-03-05T01:04:20\", \"files\": 1, \"merge\": false, \"commi…
#> # ℹ 15,477 more rows

Created on 2024-08-11 with reprex v2.1.0

@paleolimbot paleolimbot marked this pull request as ready for review August 12, 2024 02:49
@paleolimbot paleolimbot requested a review from lidavidm as a code owner August 12, 2024 02:49
@github-actions github-actions bot added this to the ADBC Libraries 14 milestone Aug 12, 2024
@lidavidm lidavidm merged commit 6eb5506 into apache:main Aug 12, 2024
69 checks passed
@paleolimbot paleolimbot deleted the c-driver-postgres-jsonb branch August 21, 2024 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adbc_driver_postgres add extra "\x01" to JSONB columns
2 participants