Drop duplicated rows

Description

Usage

<DataFrame>$unique(subset = NULL, ..., keep = "any", maintain_order = FALSE)

Arguments

`subset`	A character vector with the names of the column(s) to use to identify duplicates. If `NULL` (default), use all columns.
`…`	Not used.
`keep`	Which of the duplicate rows to keep: `“any”` (default): Does not give any guarantee of which row is kept. This allows more optimizations. `“first”`: Keep first unique row. `“last”`: Keep last unique row. `“none”`: Don’t keep duplicate rows.
`maintain_order`	Keep the same order as the original data. Setting this to `TRUE` makes it more expensive to compute and blocks the possibility to run on the streaming engine.

Value

DataFrame

Examples

library(polars)

df = pl$DataFrame(
  x = c(1:3, 1:3, 3:1, 1L),
  y = c(1:3, 1:3, 1:3, 1L)
)
df$height

#> [1] 10

df$unique()$height

#> [1] 5

# subset to define unique, keep only last or first
df$unique(subset = "x", keep = "last")

#> shape: (3, 2)
#> ┌─────┬─────┐
#> │ x   ┆ y   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 2   ┆ 2   │
#> │ 1   ┆ 1   │
#> │ 3   ┆ 1   │
#> └─────┴─────┘

df$unique(subset = "x", keep = "first")

#> shape: (3, 2)
#> ┌─────┬─────┐
#> │ x   ┆ y   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 3   ┆ 3   │
#> │ 2   ┆ 2   │
#> │ 1   ┆ 1   │
#> └─────┴─────┘

# only keep unique rows
df$unique(keep = "none")

#> shape: (2, 2)
#> ┌─────┬─────┐
#> │ x   ┆ y   │
#> │ --- ┆ --- │
#> │ i32 ┆ i32 │
#> ╞═════╪═════╡
#> │ 3   ┆ 1   │
#> │ 1   ┆ 3   │
#> └─────┴─────┘