Skip to content

Fetch n rows of a LazyFrame

Source code

Description

This is similar to $collect() but limit the number of rows to collect. It is mostly useful to check that a query works as expected.

Usage

<LazyFrame>$fetch(
  n_rows = 500,
  ...,
  type_coercion = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  streaming = FALSE,
  no_optimization = FALSE,
  inherit_optimization = FALSE
)

Arguments

n_rows Integer. Maximum number of rows to fetch.
Ignored.
type_coercion Logical. Coerce types such that operations succeed and run on minimal required memory.
predicate_pushdown Logical. Applies filters as early as possible at scan level.
projection_pushdown Logical. Select only the columns that are needed at the scan level.
simplify_expression Logical. Various optimizations, such as constant folding and replacing expensive operations with faster alternatives.
slice_pushdown Logical. Only load the required slice from the scan level. Don’t materialize sliced outputs (e.g. join$head(10)).
comm_subplan_elim Logical. Will try to cache branching subplans that occur on self-joins or unions.
comm_subexpr_elim Logical. Common subexpressions will be cached and reused.
streaming Logical. Run parts of the query in a streaming fashion (this is in an alpha state).
no_optimization Logical. Sets the following parameters to FALSE: predicate_pushdown, projection_pushdown, slice_pushdown, comm_subplan_elim, comm_subexpr_elim.
inherit_optimization Logical. Use existing optimization settings regardless the settings specified in this function call.

Details

$fetch() does not guarantee the final number of rows in the DataFrame output. It only guarantees that n rows are used at the beginning of the query. Filters, join operations and a lower number of rows available in the scanned file influence the final number of rows.

Value

A DataFrame of maximum n_rows

See Also

  • $collect() - regular collect.
  • $profile() - same as $collect() but also returns a table with each operation profiled.
  • $collect_in_background() - non-blocking collect returns a future handle. Can also just be used via $collect(collect_in_background = TRUE).
  • $sink_parquet() streams query to a parquet file.
  • $sink_ipc() streams query to a arrow file.

Examples

library(polars)

# fetch 3 rows
pl$LazyFrame(iris)$fetch(3)
#> shape: (3, 5)
#> ┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
#> │ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species │
#> │ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---     │
#> │ f64          ┆ f64         ┆ f64          ┆ f64         ┆ cat     │
#> ╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
#> │ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa  │
#> │ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa  │
#> │ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa  │
#> └──────────────┴─────────────┴──────────────┴─────────────┴─────────┘
# this fetch-query returns 4 rows, because we started with 3 and appended one
# row in the query (see section 'Details')
pl$LazyFrame(iris)$
  select(pl$col("Species")$append("flora gigantica, alien"))$
  fetch(3)
#> shape: (4, 1)
#> ┌────────────────────────┐
#> │ Species                │
#> │ ---                    │
#> │ str                    │
#> ╞════════════════════════╡
#> │ setosa                 │
#> │ setosa                 │
#> │ setosa                 │
#> │ flora gigantica, alien │
#> └────────────────────────┘