Comparing DuckDB/Arrow Performance

R
benchmark
How to evaluate memory and CPU usage for long-running processes in duckdb/arrow
Author

Nicolas Chuche

Published

July 10, 2025

When it comes to comparing different approaches, the ideal scenario is to run the code in benchmarking tools, but the “classic” R tools are not well suited for comparing duckdb and/or arrow code:

In my articles, I will regularly use timemoir, written specifically for this type of comparison:

library(timemoir)

test_function <- function(n) {
  x <- rnorm(n); mean(x)
}

res <- timemoir(
  test_function(1.2e7),
  test_function(4e7),
  test_function(1e8)
)
res |> 
  kableExtra::kable()
fname duration error start_mem max_mem cpu_user cpu_sys
test_function(1.2e+07) 1.823 NA 110012 204736 1.455 0.137
test_function(4e+07) 4.600 NA 109296 423636 3.996 0.276
test_function(1e+08) 9.564 NA 109232 892384 9.065 0.495
plot(res)


That said, these are not “true” rigorous benchmarks—well beyond the scope of this blog, but rather quick comparisons intended to provide a rough idea of relative performance.

devtools::session_info(pkgs = "attached")
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.0 (2025-04-11)
 os       Ubuntu 22.04.5 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Etc/UTC
 date     2025-08-09
 pandoc   3.7.0.2 @ /usr/bin/ (via rmarkdown)
 quarto   1.7.31 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package  * version    date (UTC) lib source
 timemoir * 0.8.0.9000 2025-08-09 [1] Github (nbc/timemoir@646734a)

 [1] /usr/local/lib/R/site-library
 [2] /usr/local/lib/R/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────
Back to top