Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duckplyr 1.0.0 #724

Draft
wants to merge 57 commits into
base: main
Choose a base branch
from
Draft

duckplyr 1.0.0 #724

wants to merge 57 commits into from

Conversation

maelle
Copy link
Contributor

@maelle maelle commented Feb 11, 2025

@krlmlr the "stingy" example does not work, it should generate an error but does not. 🤔

I'm a bit undecided regarding structure. I tried starting with basic usage, but even simply discussing library() vs individual activation via duck_tibble() is better done with some understanding of prudence I think.

@maelle
Copy link
Contributor Author

maelle commented Feb 11, 2025

Furthermore, this blog post might need a benchmark.

Maybe it could be structured around "why bother" (despite already having code that works without duckplyr, despite the fallbacks and some "annoying" incompatibilities like factors and timezones): duckplyr already works fairly well, and is under active development.

And the choice is IMHO probably not duckplyr vs dplyr but rather duckplyr vs other dplyr backends. So large data support is crucial.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duckdb_tibble() needs the dot, the other functions don't.


```{r}
out <- babynames |>
duckdb_tibble(prudence = "lavish") |> # default value of prudence :-)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_duckdb_tibble() or .prudence :

Suggested change
duckdb_tibble(prudence = "lavish") |> # default value of prudence :-)
as_duckdb_tibble(prudence = "lavish") |> # default value of prudence :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh 🤦‍♀️

- [computation to files](https://duckplyr.tidyverse.org/reference/compute_file.html) using `compute_parquet()` or `compute_csv()`.

A drawback of analyzing large data with duckplyr is that the limitations of duckplyr won't be compensated by fallbacks, since fallbacks to dplyr necessitate putting data into memory.
Therefore, if your pipeline encounters fallbacks, you might want to work around them by converting the duck frame into a table through `compute()` then running SQL code through the experimental `read_sql_duckdb()` function. Again, over time, we expect more native support for dplyr functionality.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krlmlr could we tweak the example to use ceiling() that isn't supported I think? So it'd look more realistic. (I do not know SQL 🙈 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants