Skip to content

Collect and profile a lazy query.

Source code

Description

This will run the query and return a list containing the materialized DataFrame and a DataFrame that contains profiling information of each node that is executed.

Usage

<LazyFrame>$profile(
  type_coercion = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  streaming = FALSE,
  no_optimization = FALSE,
  inherit_optimization = FALSE,
  collect_in_background = FALSE,
  show_plot = FALSE,
  truncate_nodes = 0
)

Arguments

type_coercion Logical. Coerce types such that operations succeed and run on minimal required memory.
predicate_pushdown Logical. Applies filters as early as possible at scan level.
projection_pushdown Logical. Select only the columns that are needed at the scan level.
simplify_expression Logical. Various optimizations, such as constant folding and replacing expensive operations with faster alternatives.
slice_pushdown Logical. Only load the required slice from the scan level. Don’t materialize sliced outputs (e.g. join$head(10)).
comm_subplan_elim Logical. Will try to cache branching subplans that occur on self-joins or unions.
comm_subexpr_elim Logical. Common subexpressions will be cached and reused.
streaming Logical. Run parts of the query in a streaming fashion (this is in an alpha state).
no_optimization Logical. Sets the following parameters to FALSE: predicate_pushdown, projection_pushdown, slice_pushdown, comm_subplan_elim, comm_subexpr_elim.
inherit_optimization Logical. Use existing optimization settings regardless the settings specified in this function call.
collect_in_background Logical. Detach this query from R session. Computation will start in background. Get a handle which later can be converted into the resulting DataFrame. Useful in interactive mode to not lock R session.
show_plot Show a Gantt chart of the profiling result
truncate_nodes Truncate the label lengths in the Gantt chart to this number of characters. If 0 (default), do not truncate.

Details

The units of the timings are microseconds.

Value

List of two DataFrames: one with the collected result, the other with the timings of each step. If show_graph = TRUE, then the plot is also stored in the list.

See Also

  • $collect() - regular collect.
  • $fetch() - fast limited query check
  • $collect_in_background() - non-blocking collect returns a future handle. Can also just be used via $collect(collect_in_background = TRUE).
  • $sink_parquet() streams query to a parquet file.
  • $sink_ipc() streams query to a arrow file.

Examples

library(polars)

# Simplest use case
pl$LazyFrame()$select(pl$lit(2) + 2)$profile()
#> $result
#> shape: (1, 1)
#> ┌─────────┐
#> │ literal │
#> │ ---     │
#> │ f64     │
#> ╞═════════╡
#> │ 4.0     │
#> └─────────┘
#> 
#> $profile
#> shape: (2, 3)
#> ┌─────────────────┬───────┬─────┐
#> │ node            ┆ start ┆ end │
#> │ ---             ┆ ---   ┆ --- │
#> │ str             ┆ u64   ┆ u64 │
#> ╞═════════════════╪═══════╪═════╡
#> │ optimization    ┆ 0     ┆ 24  │
#> │ select(literal) ┆ 24    ┆ 98  │
#> └─────────────────┴───────┴─────┘
# Use $profile() to compare two queries

# -1-  map each Species-group with native polars, takes ~120us only
pl$LazyFrame(iris)$
  sort("Sepal.Length")$
  group_by("Species", maintain_order = TRUE)$
  agg(pl$col(pl$Float64)$first() + 5)$
  profile()
#> $result
#> shape: (3, 5)
#> ┌────────────┬──────────────┬─────────────┬──────────────┬─────────────┐
#> │ Species    ┆ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width │
#> │ ---        ┆ ---          ┆ ---         ┆ ---          ┆ ---         │
#> │ cat        ┆ f64          ┆ f64         ┆ f64          ┆ f64         │
#> ╞════════════╪══════════════╪═════════════╪══════════════╪═════════════╡
#> │ setosa     ┆ 9.3          ┆ 8.0         ┆ 6.1          ┆ 5.1         │
#> │ versicolor ┆ 9.9          ┆ 7.4         ┆ 8.3          ┆ 6.0         │
#> │ virginica  ┆ 9.9          ┆ 7.5         ┆ 9.5          ┆ 6.7         │
#> └────────────┴──────────────┴─────────────┴──────────────┴─────────────┘
#> 
#> $profile
#> shape: (3, 3)
#> ┌────────────────────┬───────┬─────┐
#> │ node               ┆ start ┆ end │
#> │ ---                ┆ ---   ┆ --- │
#> │ str                ┆ u64   ┆ u64 │
#> ╞════════════════════╪═══════╪═════╡
#> │ optimization       ┆ 0     ┆ 16  │
#> │ sort(Sepal.Length) ┆ 16    ┆ 591 │
#> │ group_by(Species)  ┆ 594   ┆ 985 │
#> └────────────────────┴───────┴─────┘
# -2-  map each Species-group of each numeric column with an R function, takes ~7000us (slow!)

# some R function, prints `.` for each time called by polars
r_func = \(s) {
  cat(".")
  s$to_r()[1] + 5
}

pl$LazyFrame(iris)$
  sort("Sepal.Length")$
  group_by("Species", maintain_order = TRUE)$
  agg(pl$col(pl$Float64)$map_elements(r_func))$
  profile()
#> ............

#> $result
#> shape: (3, 5)
#> ┌────────────┬────────────────────┬───────────────────┬────────────────────┬───────────────────┐
#> │ Species    ┆ Sepal.Length_apply ┆ Sepal.Width_apply ┆ Petal.Length_apply ┆ Petal.Width_apply │
#> │ ---        ┆ ---                ┆ ---               ┆ ---                ┆ ---               │
#> │ cat        ┆ f64                ┆ f64               ┆ f64                ┆ f64               │
#> ╞════════════╪════════════════════╪═══════════════════╪════════════════════╪═══════════════════╡
#> │ setosa     ┆ 9.3                ┆ 8.0               ┆ 6.1                ┆ 5.1               │
#> │ versicolor ┆ 9.9                ┆ 7.4               ┆ 8.3                ┆ 6.0               │
#> │ virginica  ┆ 9.9                ┆ 7.5               ┆ 9.5                ┆ 6.7               │
#> └────────────┴────────────────────┴───────────────────┴────────────────────┴───────────────────┘
#> 
#> $profile
#> shape: (3, 3)
#> ┌────────────────────┬───────┬───────┐
#> │ node               ┆ start ┆ end   │
#> │ ---                ┆ ---   ┆ ---   │
#> │ str                ┆ u64   ┆ u64   │
#> ╞════════════════════╪═══════╪═══════╡
#> │ optimization       ┆ 0     ┆ 6     │
#> │ sort(Sepal.Length) ┆ 6     ┆ 543   │
#> │ group_by(Species)  ┆ 546   ┆ 54345 │
#> └────────────────────┴───────┴───────┘