How to evaluate memory and CPU usage for long-running processes in duckdb/arrow
Author
Nicolas Chuche
Published
July 10, 2025
When it comes to comparing different approaches, the ideal scenario is to run the code in benchmarking tools, but the “classic” R tools are not well suited for comparing duckdb and/or arrow code:
tictoc only returns elapsed time
bench does not detect memory allocations from duckdb and arrow
…
In my articles, I will regularly use timemoir, written specifically for this type of comparison:
library(timemoir)test_function <-function(n) { x <-rnorm(n); mean(x)}res <-timemoir(test_function(1.2e7),test_function(4e7),test_function(1e8))
res |> kableExtra::kable()
fname
duration
error
start_mem
max_mem
cpu_user
cpu_sys
test_function(1.2e+07)
1.823
NA
110012
204736
1.455
0.137
test_function(4e+07)
4.600
NA
109296
423636
3.996
0.276
test_function(1e+08)
9.564
NA
109232
892384
9.065
0.495
plot(res)
That said, these are not “true” rigorous benchmarks—well beyond the scope of this blog, but rather quick comparisons intended to provide a rough idea of relative performance.
Session Information
devtools::session_info(pkgs ="attached")
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.5.0 (2025-04-11)
os Ubuntu 22.04.5 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Etc/UTC
date 2025-08-09
pandoc 3.7.0.2 @ /usr/bin/ (via rmarkdown)
quarto 1.7.31 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
timemoir * 0.8.0.9000 2025-08-09 [1] Github (nbc/timemoir@646734a)
[1] /usr/local/lib/R/site-library
[2] /usr/local/lib/R/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────