From 8b6d0bdb16f3362d49e56df2f593e9387f3c40af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 07:38:15 +0100 Subject: [PATCH 01/26] Space at EOL --- content/blog/duckplyr-1-0-0/index.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 989d6bee..e22b57c6 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -15,7 +15,7 @@ photo: author: Kiril Gruev # one of: "deep-dive", "learn", "package", "programming", "roundup", or "other" -categories: [package] +categories: [package] tags: - duckplyr - dplyr @@ -35,7 +35,7 @@ TODO: * [x] `usethis::use_tidy_thanks()` --> -We're very chuffed to announce the release of [duckplyr](https://duckplyr.tidyverse.org) 1.0.0. +We're very chuffed to announce the release of [duckplyr](https://duckplyr.tidyverse.org) 1.0.0. duckplyr is a drop-in, fully compatible replacement for dplyr, powered by [DuckDB](https://duckdb.org/) for speed. It joins the rank of dplyr backends together with [dtplyr](https://dtplyr.tidyverse.org) and [dbplyr](https://dbplyr.tidyverse.org). You can use it instead of dplyr for data small or large. @@ -173,7 +173,7 @@ Our goals for future development of duckplyr include: - Enabling users to provide [custom translations](https://github.com/tidyverse/duckplyr/issues/158) of dplyr functionality; - Making it easier to contribute code to duckplyr. -You can help! +You can help! - Please report any issue especially regarding unknown incompabilities. See [`vignette("limits")`](https://duckplyr.tidyverse.org/articles/limits.html). - Contribute to the codebase after reading duckplyr's [contributing guide](https://duckplyr.tidyverse.org/CONTRIBUTING.html). From e25fb079f3e24436fb705ff15fda07a3db6dc161 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 07:50:34 +0100 Subject: [PATCH 02/26] Sentence --- content/blog/duckplyr-1-0-0/index.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index e22b57c6..72161ae8 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -6,9 +6,9 @@ title: duckplyr fully joins the tidyverse! date: 2025-02-11 author: Kirill Müller and Maëlle Salmon description: > - duckplyr 1.0.0 is on CRAN and part of the tidyverse! duckplyr is a drop-in - replacement for dplyr, powered by DuckDB for speed. It is the most dplyr-like - of dplyr backends. + duckplyr 1.0.0 is on CRAN and part of the tidyverse! + A drop-in replacement for dplyr, powered by DuckDB for speed. + It is the most dplyr-like of dplyr backends. photo: url: https://www.pexels.com/photo/a-mallard-duck-on-water-6918877/ From a0b9b3982e566f10f8fcc6c92bf2b8c67e00540e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 07:50:39 +0100 Subject: [PATCH 03/26] FIXME --- content/blog/duckplyr-1-0-0/index.Rmd | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 72161ae8..e5c4d1c2 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -40,6 +40,14 @@ duckplyr is a drop-in, fully compatible replacement for dplyr, powered by [DuckD It joins the rank of dplyr backends together with [dtplyr](https://dtplyr.tidyverse.org) and [dbplyr](https://dbplyr.tidyverse.org). You can use it instead of dplyr for data small or large. + + You can install it from CRAN with: ```{r, eval = FALSE} From b90e8af1ae648168b735161a1773379a5827b2dd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 07:50:43 +0100 Subject: [PATCH 04/26] Shorten --- content/blog/duckplyr-1-0-0/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index e5c4d1c2..230b59dc 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -54,7 +54,7 @@ You can install it from CRAN with: install.packages("duckplyr") ``` -In this article, we'll introduce you to the basic concepts behind duckplyr, show how it can help you handle normal sized but also large data, and explain how you can help improve the package. +In this article, we'll introduce you to the basic concepts behind duckplyr, show how it can help you data of different sizes, and explain how you can help improve the package. ## A drop-in replacement for dplyr From 5ac6f6e707ef07699d984d90629b01c14b03a908 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 07:50:47 +0100 Subject: [PATCH 05/26] Verbose link --- content/blog/duckplyr-1-0-0/index.Rmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 230b59dc..4c9daf49 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -60,7 +60,8 @@ In this article, we'll introduce you to the basic concepts behind duckplyr, show The duckplyr package is a _drop-in replacement for dplyr_ that uses _DuckDB for speed_. You can simply _drop_ duckplyr into your pipeline by loading it, then computations will be efficiently carried out by DuckDB. -DuckDB is a [fast database system](https://www.youtube.com/watch?v=GELhdezYmP0&feature=youtu.be). +DuckDB is a fast in-memory analytical database system. +If you haven't heard about it, watch [Hannes Mühleisen's keynote at posit::conf(2024)](https://www.youtube.com/watch?v=GELhdezYmP0&feature=youtu.be). ```{r} library(conflicted) From 14ec2f64e3107e453f9b4b801a2d55a2de7e8ba2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 07:54:21 +0100 Subject: [PATCH 06/26] Not dying on this particular hill here --- content/blog/duckplyr-1-0-0/index.Rmd | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 4c9daf49..01b64a97 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -67,8 +67,7 @@ If you haven't heard about it, watch [Hannes Mühleisen's keynote at posit::conf library(conflicted) library(duckplyr) conflict_prefer("filter", "dplyr", quiet = TRUE) -library("babynames") - +library(babynames) out <- babynames |> filter(n > 1000) |> @@ -106,7 +105,7 @@ Then, the data manipulation pipeline uses the exact same syntax as a dplyr pipel The duckplyr package performs the computation using DuckDB. ```{r} -library("babynames") +library(babynames) out <- babynames |> filter(n > 1000) |> summarize( From b9a277a126bdd771f45704eab06fd116290b8ec2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 07:58:34 +0100 Subject: [PATCH 07/26] Tweak query, let's see --- content/blog/duckplyr-1-0-0/index.Rmd | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 01b64a97..5585a7c3 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -70,14 +70,13 @@ conflict_prefer("filter", "dplyr", quiet = TRUE) library(babynames) out <- babynames |> - filter(n > 1000) |> + mutate(is_frequent = (prop >= 0.01)) |> summarize( - .by = c(sex, year), + .by = c(sex, year, is_frequent), babies_n = sum(n) ) |> filter(sex == "F") class(out) - ``` Like with other dplyr backends like dtplyr and dbplyr, duckplyr allows you to get faster results. From 5762c0a1623d6d97ea887cdd49251b19a0d141f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 07:58:48 +0100 Subject: [PATCH 08/26] Prune --- content/blog/duckplyr-1-0-0/index.Rmd | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 5585a7c3..58f9b81e 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -81,10 +81,8 @@ class(out) Like with other dplyr backends like dtplyr and dbplyr, duckplyr allows you to get faster results. Unlike other dplyr backends, duckplyr does not require you to learn a different syntax. - The duckplyr package is fully compatible with dplyr: if an operation cannot be carried out with DuckDB, it is automatically outsourced to dplyr. -In that case, the operation is not slower than dplyr but not faster either. -The duckplyr package is actively developed so that over time, we expect fewer and fewer fallbacks to dplyr to be needed. +Over time, we expect fewer and fewer fallbacks to dplyr to be needed. ## How to use duckplyr From 6aaf9538f99b56b54a164829e7eee8a0e1bd2a3a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:17:32 +0100 Subject: [PATCH 09/26] This works --- content/blog/duckplyr-1-0-0/index.Rmd | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 58f9b81e..d62db655 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -70,13 +70,14 @@ conflict_prefer("filter", "dplyr", quiet = TRUE) library(babynames) out <- babynames |> - mutate(is_frequent = (prop >= 0.01)) |> + mutate(prevalence = if_else(prop >= 0.01, "frequent", "rare")) |> summarize( - .by = c(sex, year, is_frequent), + .by = c(sex, year, prevalence), babies_n = sum(n) ) |> filter(sex == "F") class(out) +out ``` Like with other dplyr backends like dtplyr and dbplyr, duckplyr allows you to get faster results. From f8477365f8c6bf982d6c1bb2c7f015424ccbe5d9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:17:57 +0100 Subject: [PATCH 10/26] Tweak narrative --- content/blog/duckplyr-1-0-0/index.Rmd | 40 +++++++++++++++------------ 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index d62db655..b3418954 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -87,30 +87,34 @@ Over time, we expect fewer and fewer fallbacks to dplyr to be needed. ## How to use duckplyr -To _replace_ dplyr with duckplyr, you can either +To _replace_ dplyr with duckplyr, you can: -- load duckplyr and then keep your pipeline as is. Calling `library(duckplyr)` overwrites dplyr methods, enabling duckplyr for the entire session no matter how data.frames are created. +- Load duckplyr and then keep your pipeline as is. Calling `library(duckplyr)` overwrites dplyr methods, enabling duckplyr for the entire session no matter how data.frames are created. + This is shown in the example above. -```{r} -library(conflicted) -library(duckplyr) -conflict_prefer("filter", "dplyr", quiet = TRUE) -``` +- Create individual "duck frames" using _conversion functions_ like `duckdb_tibble()` or `as_duckdb_tibble()`, or _ingestion functions_ like `read_csv_duckdb()`. + Then, the data manipulation pipeline uses the exact same syntax as a dplyr pipeline. + The duckplyr package performs the computation using DuckDB. + + ```{r} + # Undo the effect of library(duckplyr) + methods_restore() -- Create individual "duck frames" which allows you to control their automatic materialization parameters to [protect memory](https://duckplyr.tidyverse.org/articles/prudence.html). To do so, you can use _conversion functions_ like `duckdb_tibble()` or `as_duckdb_tibble()`, or _ingestion functions_ like `read_csv_duckdb()`. + out <- babynames |> + as_duckdb_tibble() |> + mutate(prevalence = if_else(prop >= 0.01, "frequent", "rare")) |> + summarize( + .by = c(sex, year, prevalence), + babies_n = sum(n) + ) |> + filter(sex == "F") + class(out) + ``` -Then, the data manipulation pipeline uses the exact same syntax as a dplyr pipeline. -The duckplyr package performs the computation using DuckDB. +In both cases, printing the result only shows the first few rows, as with dbplyr. ```{r} -library(babynames) -out <- babynames |> - filter(n > 1000) |> - summarize( - .by = c(sex, year), - babies_n = sum(n) - ) |> - filter(sex == "F") +out ``` The result can finally be materialized to memory, or computed temporarily, or computed to a file. From c78073f52ca159191fe90c5bbc817520d36d5fff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:18:21 +0100 Subject: [PATCH 11/26] Choose pivoting as an important op not yet supported --- content/blog/duckplyr-1-0-0/index.Rmd | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index b3418954..fa291626 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -121,29 +121,26 @@ The result can finally be materialized to memory, or computed temporarily, or co ```{r} # to memory -out +collect(out) # to a file csv_file <- withr::local_tempfile() -file.size(csv_file) compute_csv(out, csv_file) -file.size(csv_file) +fs::file_size(csv_file) ``` When duckplyr itself does not support specific functionality, it falls back to dplyr. -For instance, row names are not supported yet: +For instance, pivoting is not supported yet, still it works thanks to the fallback mechanism. ```{r} -mtcars |> - summarize( - .by = cyl, - disp = mean(disp, na.rm = TRUE), - sd = sd(disp, na.rm = TRUE) - ) +out |> + tidyr::pivot_wider(names_from = prevalence, values_from = babies_n, values_fill = 0L) |> + mutate(share_frequent = frequent / (frequent + rare)) ``` -Current limitations are documented in a [vignette](https://duckplyr.tidyverse.org/articles/limits.html). -You can change the verbosity of fallbacks, refer to [`duckplyr::fallback_sitrep()`](https://duckplyr.tidyverse.org/reference/fallback.html). +For performance reasons, the output order of the result is not guaranteed to be stable. +If you need a stable order, you can use `arrange()`. +Other limitations are documented in [`vignette("limits")`](https://duckplyr.tidyverse.org/articles/limits.html). ### For large data From 1f898c0316e7be622295cf307b028e4b9bf25112 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:18:42 +0100 Subject: [PATCH 12/26] Link style --- content/blog/duckplyr-1-0-0/index.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index fa291626..891854a4 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -151,8 +151,8 @@ With large datasets, you want: - input data in an efficient format, like Parquet files, which duckplyr allows thanks to its ingestion functions like `read_parquet_duckdb()`. - efficient computation, which duckplyr provides via DuckDB's holistic optimization, without your having to use another syntax than dplyr. - the output to not clutter all the memory, which duckplyr supports through two features: - - the [control of automatic materialization](https://duckplyr.tidyverse.org/articles/prudence.html) (collection of results into memory) thanks to the `prudence` parameter. You can disable automatic materialization completely or, as a compromise, disable it up to a certain output size. - - [computation to files](https://duckplyr.tidyverse.org/reference/compute_file.html) using `compute_parquet()` or `compute_csv()`. + - computation to files using [`compute_parquet()`](https://duckplyr.tidyverse.org/reference/compute_file.html) or [`compute_csv()`](https://duckplyr.tidyverse.org/reference/compute_file.html). + - the control of automatic materialization (collection of results into memory). You can disable automatic materialization completely or, as a compromise, disable it up to a certain output size. See [`vignette("prudence")`](https://duckplyr.tidyverse.org/articles/prudence.html) for details A drawback of analyzing large data with duckplyr is that the limitations of duckplyr won't be compensated by fallbacks, since fallbacks to dplyr necessitate putting data into memory. Therefore, if your pipeline encounters fallbacks, you might want to work around them by converting the duck frame into a table through `compute()` then running SQL code through the experimental `read_sql_duckdb()` function. Again, over time, we expect more native support for dplyr functionality. From 5a1f22c005ac3686beb46d01d557ded3dcc89e57 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:18:50 +0100 Subject: [PATCH 13/26] aeolus --- content/blog/duckplyr-1-0-0/index.Rmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 891854a4..0865b3c7 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -155,7 +155,8 @@ With large datasets, you want: - the control of automatic materialization (collection of results into memory). You can disable automatic materialization completely or, as a compromise, disable it up to a certain output size. See [`vignette("prudence")`](https://duckplyr.tidyverse.org/articles/prudence.html) for details A drawback of analyzing large data with duckplyr is that the limitations of duckplyr won't be compensated by fallbacks, since fallbacks to dplyr necessitate putting data into memory. -Therefore, if your pipeline encounters fallbacks, you might want to work around them by converting the duck frame into a table through `compute()` then running SQL code through the experimental `read_sql_duckdb()` function. Again, over time, we expect more native support for dplyr functionality. +Therefore, if your pipeline encounters fallbacks, you might want to work around them by converting the duck frame into a table through `compute()` then running SQL code through the experimental `read_sql_duckdb()` function. +Again, over time, we expect more native support for dplyr functionality. ```{r} data <- From 2b4b421a1e1b4e5cfc64ec1cb87bf17320db8615 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:21:05 +0100 Subject: [PATCH 14/26] Help --- content/blog/duckplyr-1-0-0/index.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 0865b3c7..5eaf3fc5 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -177,13 +177,13 @@ sql_data Our goals for future development of duckplyr include: -- Increasing the native support for dplyr functionality; - Enabling users to provide [custom translations](https://github.com/tidyverse/duckplyr/issues/158) of dplyr functionality; -- Making it easier to contribute code to duckplyr. +- Making it easier to contribute code to duckplyr; +- Supporting more dplyr and tidyr functionality natively in DuckDB. You can help! -- Please report any issue especially regarding unknown incompabilities. See [`vignette("limits")`](https://duckplyr.tidyverse.org/articles/limits.html). +- Please report any issues, especially regarding unknown incompabilities. See [`vignette("limits")`](https://duckplyr.tidyverse.org/articles/limits.html). - Contribute to the codebase after reading duckplyr's [contributing guide](https://duckplyr.tidyverse.org/CONTRIBUTING.html). - Turn on telemetry to help us hear about the most frequent fallbacks so we can prioritize working on the corresponding missing dplyr translation. See [`vignette("telemetry")`](https://duckplyr.tidyverse.org/articles/telemetry.html) and the [`duckplyr::fallback_sitrep()`](https://duckplyr.tidyverse.org/reference/fallback.html) function. From d97b03168cca8b73c5e88f63bfa8b7d42b30cbef Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:22:23 +0100 Subject: [PATCH 15/26] Exclude maintainers --- content/blog/duckplyr-1-0-0/index.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 5eaf3fc5..0d9391d1 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -189,6 +189,6 @@ You can help! ## Acknowledgements -A big thanks to all 54 folks who filed issues, created PRs and generally helped to improve duckplyr! +A big thanks to all folks who filed issues, created PRs and generally helped to improve duckplyr! -[@adamschwing](https://github.com/adamschwing), [@andreranza](https://github.com/andreranza), [@apalacio9502](https://github.com/apalacio9502), [@apsteinmetz](https://github.com/apsteinmetz), [@barracuda156](https://github.com/barracuda156), [@beniaminogreen](https://github.com/beniaminogreen), [@bob-rietveld](https://github.com/bob-rietveld), [@brichards920](https://github.com/brichards920), [@cboettig](https://github.com/cboettig), [@davidjayjackson](https://github.com/davidjayjackson), [@DavisVaughan](https://github.com/DavisVaughan), [@Ed2uiz](https://github.com/Ed2uiz), [@eitsupi](https://github.com/eitsupi), [@era127](https://github.com/era127), [@etiennebacher](https://github.com/etiennebacher), [@eutwt](https://github.com/eutwt), [@fmichonneau](https://github.com/fmichonneau), [@github-actions[bot]](https://github.com/github-actions[bot]), [@hadley](https://github.com/hadley), [@hannes](https://github.com/hannes), [@hawkfish](https://github.com/hawkfish), [@IndrajeetPatil](https://github.com/IndrajeetPatil), [@JanSulavik](https://github.com/JanSulavik), [@JavOrraca](https://github.com/JavOrraca), [@jeroen](https://github.com/jeroen), [@jhk0530](https://github.com/jhk0530), [@joakimlinde](https://github.com/joakimlinde), [@JosiahParry](https://github.com/JosiahParry), [@krlmlr](https://github.com/krlmlr), [@larry77](https://github.com/larry77), [@lnkuiper](https://github.com/lnkuiper), [@lorenzwalthert](https://github.com/lorenzwalthert), [@luisDVA](https://github.com/luisDVA), [@maelle](https://github.com/maelle), [@math-mcshane](https://github.com/math-mcshane), [@meersel](https://github.com/meersel), [@multimeric](https://github.com/multimeric), [@mytarmail](https://github.com/mytarmail), [@nicki-dese](https://github.com/nicki-dese), [@PMassicotte](https://github.com/PMassicotte), [@prasundutta87](https://github.com/prasundutta87), [@rafapereirabr](https://github.com/rafapereirabr), [@Robinlovelace](https://github.com/Robinlovelace), [@romainfrancois](https://github.com/romainfrancois), [@sparrow925](https://github.com/sparrow925), [@stefanlinner](https://github.com/stefanlinner), [@thomasp85](https://github.com/thomasp85), [@TimTaylor](https://github.com/TimTaylor), [@Tmonster](https://github.com/Tmonster), [@toppyy](https://github.com/toppyy), [@wibeasley](https://github.com/wibeasley), [@yjunechoe](https://github.com/yjunechoe), [@ywhcuhk](https://github.com/ywhcuhk), and [@zhjx19](https://github.com/zhjx19). +[@adamschwing](https://github.com/adamschwing), [@andreranza](https://github.com/andreranza), [@apalacio9502](https://github.com/apalacio9502), [@apsteinmetz](https://github.com/apsteinmetz), [@barracuda156](https://github.com/barracuda156), [@beniaminogreen](https://github.com/beniaminogreen), [@bob-rietveld](https://github.com/bob-rietveld), [@brichards920](https://github.com/brichards920), [@cboettig](https://github.com/cboettig), [@davidjayjackson](https://github.com/davidjayjackson), [@DavisVaughan](https://github.com/DavisVaughan), [@Ed2uiz](https://github.com/Ed2uiz), [@eitsupi](https://github.com/eitsupi), [@era127](https://github.com/era127), [@etiennebacher](https://github.com/etiennebacher), [@eutwt](https://github.com/eutwt), [@fmichonneau](https://github.com/fmichonneau), [@hadley](https://github.com/hadley), [@hannes](https://github.com/hannes), [@hawkfish](https://github.com/hawkfish), [@IndrajeetPatil](https://github.com/IndrajeetPatil), [@JanSulavik](https://github.com/JanSulavik), [@JavOrraca](https://github.com/JavOrraca), [@jeroen](https://github.com/jeroen), [@jhk0530](https://github.com/jhk0530), [@joakimlinde](https://github.com/joakimlinde), [@JosiahParry](https://github.com/JosiahParry), [@larry77](https://github.com/larry77), [@lnkuiper](https://github.com/lnkuiper), [@lorenzwalthert](https://github.com/lorenzwalthert), [@luisDVA](https://github.com/luisDVA), [@maelle](https://github.com/maelle), [@math-mcshane](https://github.com/math-mcshane), [@meersel](https://github.com/meersel), [@multimeric](https://github.com/multimeric), [@mytarmail](https://github.com/mytarmail), [@nicki-dese](https://github.com/nicki-dese), [@PMassicotte](https://github.com/PMassicotte), [@prasundutta87](https://github.com/prasundutta87), [@rafapereirabr](https://github.com/rafapereirabr), [@Robinlovelace](https://github.com/Robinlovelace), [@romainfrancois](https://github.com/romainfrancois), [@sparrow925](https://github.com/sparrow925), [@stefanlinner](https://github.com/stefanlinner), [@thomasp85](https://github.com/thomasp85), [@TimTaylor](https://github.com/TimTaylor), [@Tmonster](https://github.com/Tmonster), [@toppyy](https://github.com/toppyy), [@wibeasley](https://github.com/wibeasley), [@yjunechoe](https://github.com/yjunechoe), [@ywhcuhk](https://github.com/ywhcuhk), and [@zhjx19](https://github.com/zhjx19). From 4be5ea9d14aa6197554e1cd08dc2eef76ddde316 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:32:29 +0100 Subject: [PATCH 16/26] Thanks --- content/blog/duckplyr-1-0-0/index.Rmd | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 0d9391d1..53997d41 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -192,3 +192,5 @@ You can help! A big thanks to all folks who filed issues, created PRs and generally helped to improve duckplyr! [@adamschwing](https://github.com/adamschwing), [@andreranza](https://github.com/andreranza), [@apalacio9502](https://github.com/apalacio9502), [@apsteinmetz](https://github.com/apsteinmetz), [@barracuda156](https://github.com/barracuda156), [@beniaminogreen](https://github.com/beniaminogreen), [@bob-rietveld](https://github.com/bob-rietveld), [@brichards920](https://github.com/brichards920), [@cboettig](https://github.com/cboettig), [@davidjayjackson](https://github.com/davidjayjackson), [@DavisVaughan](https://github.com/DavisVaughan), [@Ed2uiz](https://github.com/Ed2uiz), [@eitsupi](https://github.com/eitsupi), [@era127](https://github.com/era127), [@etiennebacher](https://github.com/etiennebacher), [@eutwt](https://github.com/eutwt), [@fmichonneau](https://github.com/fmichonneau), [@hadley](https://github.com/hadley), [@hannes](https://github.com/hannes), [@hawkfish](https://github.com/hawkfish), [@IndrajeetPatil](https://github.com/IndrajeetPatil), [@JanSulavik](https://github.com/JanSulavik), [@JavOrraca](https://github.com/JavOrraca), [@jeroen](https://github.com/jeroen), [@jhk0530](https://github.com/jhk0530), [@joakimlinde](https://github.com/joakimlinde), [@JosiahParry](https://github.com/JosiahParry), [@larry77](https://github.com/larry77), [@lnkuiper](https://github.com/lnkuiper), [@lorenzwalthert](https://github.com/lorenzwalthert), [@luisDVA](https://github.com/luisDVA), [@maelle](https://github.com/maelle), [@math-mcshane](https://github.com/math-mcshane), [@meersel](https://github.com/meersel), [@multimeric](https://github.com/multimeric), [@mytarmail](https://github.com/mytarmail), [@nicki-dese](https://github.com/nicki-dese), [@PMassicotte](https://github.com/PMassicotte), [@prasundutta87](https://github.com/prasundutta87), [@rafapereirabr](https://github.com/rafapereirabr), [@Robinlovelace](https://github.com/Robinlovelace), [@romainfrancois](https://github.com/romainfrancois), [@sparrow925](https://github.com/sparrow925), [@stefanlinner](https://github.com/stefanlinner), [@thomasp85](https://github.com/thomasp85), [@TimTaylor](https://github.com/TimTaylor), [@Tmonster](https://github.com/Tmonster), [@toppyy](https://github.com/toppyy), [@wibeasley](https://github.com/wibeasley), [@yjunechoe](https://github.com/yjunechoe), [@ywhcuhk](https://github.com/ywhcuhk), and [@zhjx19](https://github.com/zhjx19). + +Special thanks to Joe Thorley ([@joethorley](https://github.com/joethorley)) for help with choosing the right words. From f344b9fb354f2a911f0e67c7aef6d4086fc2598d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:33:35 +0100 Subject: [PATCH 17/26] Link --- content/blog/duckplyr-1-0-0/index.Rmd | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 53997d41..5c4fb531 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -154,6 +154,8 @@ With large datasets, you want: - computation to files using [`compute_parquet()`](https://duckplyr.tidyverse.org/reference/compute_file.html) or [`compute_csv()`](https://duckplyr.tidyverse.org/reference/compute_file.html). - the control of automatic materialization (collection of results into memory). You can disable automatic materialization completely or, as a compromise, disable it up to a certain output size. See [`vignette("prudence")`](https://duckplyr.tidyverse.org/articles/prudence.html) for details +See [`vignette("large")`](https://duckplyr.tidyverse.org/articles/large.html) for a walkthrough and more details. + A drawback of analyzing large data with duckplyr is that the limitations of duckplyr won't be compensated by fallbacks, since fallbacks to dplyr necessitate putting data into memory. Therefore, if your pipeline encounters fallbacks, you might want to work around them by converting the duck frame into a table through `compute()` then running SQL code through the experimental `read_sql_duckdb()` function. Again, over time, we expect more native support for dplyr functionality. From a13315a3b1fcb2528bead6bef6da7da54a419218 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:36:52 +0100 Subject: [PATCH 18/26] Restore narrative --- content/blog/duckplyr-1-0-0/index.Rmd | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 5c4fb531..cbd22cc6 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -85,6 +85,27 @@ Unlike other dplyr backends, duckplyr does not require you to learn a different The duckplyr package is fully compatible with dplyr: if an operation cannot be carried out with DuckDB, it is automatically outsourced to dplyr. Over time, we expect fewer and fewer fallbacks to dplyr to be needed. +The very tagline of duckplyr, being a drop-in replacement for dplyr that uses DuckDB for speed, creates a tension: + +- When using dplyr, we are not used to explicitly collect results, we simply access them: the data.frames are "eager" by default. + Adding a `collect()` step by default would confuse users and make "drop-in replacement" an exaggeration. + The collection of results, called materialization, has to be automatic by default. + Therefore, _duckplyr needs eagerness_! + +- The whole advantage of using DuckDB under the hood is letting DuckDB optimize computations, like dtplyr does with data.table. + _Therefore, duckplyr needs laziness_! + +As a consequence, duckplyr is lazy on the inside for all DuckDB operations but eager on the outside, thanks to [ALTREP](https://duckdb.org/2024/04/02/duckplyr.html#eager-vs-lazy-materialization), a powerful R feature that among other things supports *deferred evaluation*. + +> "ALTREP allows R objects to have different in-memory representations, and for custom code to be executed whenever those objects are accessed." Hannes Mühleisen. + +If the duckplyr data.frame is accessed by... + +- duckplyr, then the operations continue to be lazy (until a call to `collect.duckplyr_df()` for instance). +- not duckplyr (say, ggplot2, `nrow()`, or a dplyr operation not yet supported by duckplyr), then a special callback is executed, allowing materialization of the data frame. + +Therefore, duckplyr can be both *lazy* (within itself) and *not lazy* (for the outside world). + ## How to use duckplyr To _replace_ dplyr with duckplyr, you can: From ad9825f52bf4f619a913f80040f5291a3d67ee87 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 08:39:37 +0100 Subject: [PATCH 19/26] Add vignette link --- content/blog/duckplyr-1-0-0/index.Rmd | 1 + 1 file changed, 1 insertion(+) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index cbd22cc6..2ddf1015 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -105,6 +105,7 @@ If the duckplyr data.frame is accessed by... - not duckplyr (say, ggplot2, `nrow()`, or a dplyr operation not yet supported by duckplyr), then a special callback is executed, allowing materialization of the data frame. Therefore, duckplyr can be both *lazy* (within itself) and *not lazy* (for the outside world). +See [`vignette("fallback")`](https://duckplyr.tidyverse.org/articles/fallback.html) for more details. ## How to use duckplyr From a734638b4fbc2bc71ffff05e2200812c850cf67a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 09:06:38 +0100 Subject: [PATCH 20/26] FIXME --- content/blog/duckplyr-1-0-0/index.Rmd | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 2ddf1015..4c3e3e67 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -215,6 +215,8 @@ You can help! A big thanks to all folks who filed issues, created PRs and generally helped to improve duckplyr! + + [@adamschwing](https://github.com/adamschwing), [@andreranza](https://github.com/andreranza), [@apalacio9502](https://github.com/apalacio9502), [@apsteinmetz](https://github.com/apsteinmetz), [@barracuda156](https://github.com/barracuda156), [@beniaminogreen](https://github.com/beniaminogreen), [@bob-rietveld](https://github.com/bob-rietveld), [@brichards920](https://github.com/brichards920), [@cboettig](https://github.com/cboettig), [@davidjayjackson](https://github.com/davidjayjackson), [@DavisVaughan](https://github.com/DavisVaughan), [@Ed2uiz](https://github.com/Ed2uiz), [@eitsupi](https://github.com/eitsupi), [@era127](https://github.com/era127), [@etiennebacher](https://github.com/etiennebacher), [@eutwt](https://github.com/eutwt), [@fmichonneau](https://github.com/fmichonneau), [@hadley](https://github.com/hadley), [@hannes](https://github.com/hannes), [@hawkfish](https://github.com/hawkfish), [@IndrajeetPatil](https://github.com/IndrajeetPatil), [@JanSulavik](https://github.com/JanSulavik), [@JavOrraca](https://github.com/JavOrraca), [@jeroen](https://github.com/jeroen), [@jhk0530](https://github.com/jhk0530), [@joakimlinde](https://github.com/joakimlinde), [@JosiahParry](https://github.com/JosiahParry), [@larry77](https://github.com/larry77), [@lnkuiper](https://github.com/lnkuiper), [@lorenzwalthert](https://github.com/lorenzwalthert), [@luisDVA](https://github.com/luisDVA), [@maelle](https://github.com/maelle), [@math-mcshane](https://github.com/math-mcshane), [@meersel](https://github.com/meersel), [@multimeric](https://github.com/multimeric), [@mytarmail](https://github.com/mytarmail), [@nicki-dese](https://github.com/nicki-dese), [@PMassicotte](https://github.com/PMassicotte), [@prasundutta87](https://github.com/prasundutta87), [@rafapereirabr](https://github.com/rafapereirabr), [@Robinlovelace](https://github.com/Robinlovelace), [@romainfrancois](https://github.com/romainfrancois), [@sparrow925](https://github.com/sparrow925), [@stefanlinner](https://github.com/stefanlinner), [@thomasp85](https://github.com/thomasp85), [@TimTaylor](https://github.com/TimTaylor), [@Tmonster](https://github.com/Tmonster), [@toppyy](https://github.com/toppyy), [@wibeasley](https://github.com/wibeasley), [@yjunechoe](https://github.com/yjunechoe), [@ywhcuhk](https://github.com/ywhcuhk), and [@zhjx19](https://github.com/zhjx19). Special thanks to Joe Thorley ([@joethorley](https://github.com/joethorley)) for help with choosing the right words. From fc8122df4b45e7ae1722af09787ff44b9ebb5f09 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 09:24:37 +0100 Subject: [PATCH 21/26] Date --- content/blog/duckplyr-1-0-0/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 4c3e3e67..6a5ae01c 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -3,7 +3,7 @@ output: hugodown::hugo_document slug: duckplyr-1-0-0 title: duckplyr fully joins the tidyverse! -date: 2025-02-11 +date: 2025-02-13 author: Kirill Müller and Maëlle Salmon description: > duckplyr 1.0.0 is on CRAN and part of the tidyverse! From 3211710f3fd3892bc4dd8b3785ccb3b730213ff1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 09:24:42 +0100 Subject: [PATCH 22/26] Why bother --- content/blog/duckplyr-1-0-0/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 6a5ae01c..8c39dd16 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -92,7 +92,7 @@ The very tagline of duckplyr, being a drop-in replacement for dplyr that uses Du The collection of results, called materialization, has to be automatic by default. Therefore, _duckplyr needs eagerness_! -- The whole advantage of using DuckDB under the hood is letting DuckDB optimize computations, like dtplyr does with data.table. +- The whole advantage of using DuckDB under the hood is letting DuckDB optimize computations. This works best if DuckDB sees the entire pipeline. _Therefore, duckplyr needs laziness_! As a consequence, duckplyr is lazy on the inside for all DuckDB operations but eager on the outside, thanks to [ALTREP](https://duckdb.org/2024/04/02/duckplyr.html#eager-vs-lazy-materialization), a powerful R feature that among other things supports *deferred evaluation*. From eea955a3af9ec6e1ca3585bcf80772854d187b28 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 09:25:14 +0100 Subject: [PATCH 23/26] Level --- content/blog/duckplyr-1-0-0/index.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 8c39dd16..6f5fbecc 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -161,10 +161,10 @@ out |> ``` For performance reasons, the output order of the result is not guaranteed to be stable. -If you need a stable order, you can use `arrange()`. +If you need a stable order, you can use `arrange()` or force output order stability by setting an environment variable. Other limitations are documented in [`vignette("limits")`](https://duckplyr.tidyverse.org/articles/limits.html). -### For large data +## Large data For large data, duckplyr is a legitimate alternative to dtplyr and dbplyr. From 4a20ca304a65d669fecb73286e448ad13fce7b51 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 09:27:29 +0100 Subject: [PATCH 24/26] Move --- content/blog/duckplyr-1-0-0/index.Rmd | 46 ++++++++++++++------------- 1 file changed, 24 insertions(+), 22 deletions(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 6f5fbecc..afb6a2b0 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -85,28 +85,6 @@ Unlike other dplyr backends, duckplyr does not require you to learn a different The duckplyr package is fully compatible with dplyr: if an operation cannot be carried out with DuckDB, it is automatically outsourced to dplyr. Over time, we expect fewer and fewer fallbacks to dplyr to be needed. -The very tagline of duckplyr, being a drop-in replacement for dplyr that uses DuckDB for speed, creates a tension: - -- When using dplyr, we are not used to explicitly collect results, we simply access them: the data.frames are "eager" by default. - Adding a `collect()` step by default would confuse users and make "drop-in replacement" an exaggeration. - The collection of results, called materialization, has to be automatic by default. - Therefore, _duckplyr needs eagerness_! - -- The whole advantage of using DuckDB under the hood is letting DuckDB optimize computations. This works best if DuckDB sees the entire pipeline. - _Therefore, duckplyr needs laziness_! - -As a consequence, duckplyr is lazy on the inside for all DuckDB operations but eager on the outside, thanks to [ALTREP](https://duckdb.org/2024/04/02/duckplyr.html#eager-vs-lazy-materialization), a powerful R feature that among other things supports *deferred evaluation*. - -> "ALTREP allows R objects to have different in-memory representations, and for custom code to be executed whenever those objects are accessed." Hannes Mühleisen. - -If the duckplyr data.frame is accessed by... - -- duckplyr, then the operations continue to be lazy (until a call to `collect.duckplyr_df()` for instance). -- not duckplyr (say, ggplot2, `nrow()`, or a dplyr operation not yet supported by duckplyr), then a special callback is executed, allowing materialization of the data frame. - -Therefore, duckplyr can be both *lazy* (within itself) and *not lazy* (for the outside world). -See [`vignette("fallback")`](https://duckplyr.tidyverse.org/articles/fallback.html) for more details. - ## How to use duckplyr To _replace_ dplyr with duckplyr, you can: @@ -211,6 +189,30 @@ You can help! - Contribute to the codebase after reading duckplyr's [contributing guide](https://duckplyr.tidyverse.org/CONTRIBUTING.html). - Turn on telemetry to help us hear about the most frequent fallbacks so we can prioritize working on the corresponding missing dplyr translation. See [`vignette("telemetry")`](https://duckplyr.tidyverse.org/articles/telemetry.html) and the [`duckplyr::fallback_sitrep()`](https://duckplyr.tidyverse.org/reference/fallback.html) function. +## How does this even work? + +The very tagline of duckplyr, being a drop-in replacement for dplyr that uses DuckDB for speed, creates a tension: + +- When using dplyr, we are not used to explicitly collect results, we simply access them: the data.frames are "eager" by default. + Adding a `collect()` step by default would confuse users and make "drop-in replacement" an exaggeration. + The collection of results, called materialization, has to be automatic by default. + Therefore, _duckplyr needs eagerness_! + +- The whole advantage of using DuckDB under the hood is letting DuckDB optimize computations. This works best if DuckDB sees the entire pipeline. + _Therefore, duckplyr needs laziness_! + +As a consequence, duckplyr is lazy on the inside for all DuckDB operations but eager on the outside, thanks to [ALTREP](https://duckdb.org/2024/04/02/duckplyr.html#eager-vs-lazy-materialization), a powerful R feature that among other things supports *deferred evaluation*. + +> "ALTREP allows R objects to have different in-memory representations, and for custom code to be executed whenever those objects are accessed." Hannes Mühleisen. + +If the duckplyr data.frame is accessed by... + +- duckplyr, then the operations continue to be lazy (until a call to `collect.duckplyr_df()` for instance). +- not duckplyr (say, ggplot2, `nrow()`, or a dplyr operation not yet supported by duckplyr), then a special callback is executed, allowing materialization of the data frame. + +Therefore, duckplyr can be both *lazy* (within itself) and *not lazy* (for the outside world). +See [`vignette("fallback")`](https://duckplyr.tidyverse.org/articles/fallback.html) for more details. + ## Acknowledgements A big thanks to all folks who filed issues, created PRs and generally helped to improve duckplyr! From f5e4a38b2e96b5eb5ef23bfd1c44c64268929945 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 09:27:48 +0100 Subject: [PATCH 25/26] Detail --- content/blog/duckplyr-1-0-0/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index afb6a2b0..66e4b3bc 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -140,7 +140,7 @@ out |> For performance reasons, the output order of the result is not guaranteed to be stable. If you need a stable order, you can use `arrange()` or force output order stability by setting an environment variable. -Other limitations are documented in [`vignette("limits")`](https://duckplyr.tidyverse.org/articles/limits.html). +This and other limitations are documented in [`vignette("limits")`](https://duckplyr.tidyverse.org/articles/limits.html). ## Large data From 20dff036ed16cccd710ca1ef808b73dcd377faa9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kirill=20M=C3=BCller?= Date: Thu, 13 Feb 2025 09:29:53 +0100 Subject: [PATCH 26/26] TBC --- content/blog/duckplyr-1-0-0/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/blog/duckplyr-1-0-0/index.Rmd b/content/blog/duckplyr-1-0-0/index.Rmd index 66e4b3bc..016bed78 100644 --- a/content/blog/duckplyr-1-0-0/index.Rmd +++ b/content/blog/duckplyr-1-0-0/index.Rmd @@ -54,7 +54,7 @@ You can install it from CRAN with: install.packages("duckplyr") ``` -In this article, we'll introduce you to the basic concepts behind duckplyr, show how it can help you data of different sizes, and explain how you can help improve the package. +In this article, we'll show how duckplyr can help you with data of different size, explain how you can help improve the package, and ... . ## A drop-in replacement for dplyr